- DataSynthesizer version: 0.1.2
Description
The function get_noisy_distribution_of_attributes only gets a partial distribution. This bug was introduced in commit 1abe702. Here is the relevant code as it appears in master (currently commit be8b65a):
full_space = None
for item in grouper_it(products, 1000000):
if full_space is None:
full_space = DataFrame(columns=attributes, data=list(item))
else:
data_frame_append = DataFrame(columns=attributes, data=list(item))
full_space.append(data_frame_append)
In particular, full_space.append does not modify full_space; instead, it returns a new object. (This seems to be true for all versions of pandas.) As a result, full_space does not store all of the intended rows but, rather, only at most the first 1000000.