Skip to content

Bug: partial distribution produced by get_noisy_distribution_of_attributes #26

@zjroth

Description

@zjroth
  • DataSynthesizer version: 0.1.2

Description

The function get_noisy_distribution_of_attributes only gets a partial distribution. This bug was introduced in commit 1abe702. Here is the relevant code as it appears in master (currently commit be8b65a):

full_space = None
for item in grouper_it(products, 1000000):
    if full_space is None:
        full_space = DataFrame(columns=attributes, data=list(item))
    else:
        data_frame_append = DataFrame(columns=attributes, data=list(item))
        full_space.append(data_frame_append)

In particular, full_space.append does not modify full_space; instead, it returns a new object. (This seems to be true for all versions of pandas.) As a result, full_space does not store all of the intended rows but, rather, only at most the first 1000000.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions