-
Notifications
You must be signed in to change notification settings - Fork 89
Open
Description
- DataSynthesizer version:
- Python version:
- Operating System:
Description
Hello,
I am using DataSynthesizer to generate synthetic data for research purposes. I've been using this package for moths and it works perfectly with small datasets. However, when I use a bigger dataset, especially higher number of columns, time problem rises. A single dataset(with 71236 instances and 52) took more than 18 hours to be synthesized on a 64 core machine(degree_of_bayesian_network =0 in this case) .
I also tried to decrease the degree_of_bayesian_network , by assigning it to 2 instead of the default 0. Although the quality of the synthesized data decreases, Time decreases , but it's still taking too long.
What do you suggest to do? Is there a better way you recommend to approach bigger datasets?
What I Did
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
Metadata
Metadata
Assignees
Labels
No labels