Skip to content

Validation of Synthetic data #4

@whitehackr

Description

@whitehackr

The synthetic data generated has some issues of inconsistency when compared to the original data:

  • order_id formats: 43693 vs 170933760013427
  • Customer information consistency: Customer 34857, is assigned gender "F" and yet the original data reads "M" for gender -- for this, customer fact/details should not have been generated, just additional order information.
  • All orders are created, shipped and delivered at midnight
  • Original dataset seems to not have shipping and delivery data while synthetic data has it. Needs to be dropped (and to not be used downstream)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions