Numpy Datatypes in Conditional Distributions of Description File

* DataSynthesizer version: 0.1.13
* Python version: 3.9
* Operating System: Windows 11

### Description

Trying to create a synthetic dataset from the [Kaggle adult census dataset](https://www.kaggle.com/datasets/uciml/adult-census-income/data?select=adult.csv) (with the `fnlwgt` column removed) in the correlated attribute mode results in the generator failing to parse the description file.

The reason for this seems to be in [L281 of PrivBayes.py](https://github.com/DataResponsibly/DataSynthesizer/blob/90722857e7f6ed736aaa25068ecf9e77f34f896a/DataSynthesizer/lib/PrivBayes.py#L281):
```python
parents_key = str([parents_instance]) if len(parents) == 1 else str(list(parents_instance))
```
This resolves int types as `np.int64(0)` instead of just `0` for `parents > 1` . This in turn causes [L99 of the DataGenerator](https://github.com/DataResponsibly/DataSynthesizer/blob/90722857e7f6ed736aaa25068ecf9e77f34f896a/DataSynthesizer/DataGenerator.py#L99) to fail, as it does not import numpy:
```python
parents_instance = list(eval(parents_instance))
```

I could fix it locally by simply adding `import numpy as np` to the `DataGenerator.py` file, but maybe it would be cleaner to correctly print the base int type into the description file in the first place.

The relevant section of the description file:
```json
"conditional_probabilities": {
        "income": [
            0.6269945618560558,
            0.37300543814394416
        ],
        "relationship": {
            "[0]": [
                0.31958572087575393,
                0.26864155111683646,
                0.062246949021475276,
                0.17143605132431283,
                0.1260161383716099,
                0.05207358929001161
            ],
            "[1]": [
                0.4276133198945046,
                0.16299606959384128,
                0.027753228447322167,
                0.17927266942607956,
                0.12404103847621967,
                0.07832367416203281
            ]
        },
        "sex": {
            "[np.int64(0), np.int64(0)]": [
                0.11899038829847323,
                0.8810096117015268
            ],
            "[np.int64(0), np.int64(1)]": [
                0.1370384306577154,
                0.8629615693422846
            ],
```

### What I Did

Python script:
```python
import os.path

import pandas as pd
from DataSynthesizer.DataDescriber import DataDescriber
from DataSynthesizer.DataGenerator import DataGenerator

from generators.generator import Generator

class PrivBayesGenerator(Generator):
    def generate(self, rows: int=None):
        input_data = str(self.real_data_path)
        description_file = str(self.real_data_path.parent / 'description.json')
        synthetic_data = self.synthetic_data_path

        epsilon = 0.1
        if rows is None:
            rows = pd.read_csv(input_data).shape[0]
        threshold_value = 50
        num_tuples_to_generate = rows

        # Describe Dataset
        if not os.path.exists(description_file):
            describer = DataDescriber(category_threshold=threshold_value)
            describer.describe_dataset_in_correlated_attribute_mode(input_data, epsilon=epsilon)
            describer.save_dataset_description_to_file(description_file)

        # Generate Synthetic Data
        generator = DataGenerator()
        generator.generate_dataset_in_correlated_attribute_mode(num_tuples_to_generate, description_file)
        generator.save_synthetic_data(synthetic_data)
```

Traceback:
```
Traceback (most recent call last):
  File "D:\...\helpers\generate_main.py", line 27, in <module>
    main()
  File "D:\...\helpers\generate_main.py", line 21, in main
    generator.generate(rows)
  File "D:\...\generators\priv_bayes_generator.py", line 35, in generate
    generator.generate_dataset_in_correlated_attribute_mode(num_tuples_to_generate, description_file)
  File "D:\...\venv3.9\lib\site-packages\DataSynthesizer\DataGenerator.py", line 66, in generate_dataset_in_correlated_attribute_mode
    self.encoded_dataset = DataGenerator.generate_encoded_dataset(self.n, self.description)
  File "D:\...\venv3.9\lib\site-packages\DataSynthesizer\DataGenerator.py", line 100, in generate_encoded_dataset
    parents_instance = list(eval(parents_instance))
  File "<string>", line 1, in <module>
NameError: name 'np' is not defined
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Numpy Datatypes in Conditional Distributions of Description File #44

Description

What I Did

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Numpy Datatypes in Conditional Distributions of Description File #44

Description

Description

What I Did

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions