Skip to content

describe_dataset_in_correlated_attribute_mode doesn't work in Python 3.11 #40

@artemgur

Description

@artemgur
  • DataSynthesizer version: 0.1.11 (latest)
  • Python version: 3.11
  • Operating System: Windows
  • Pandas version: 1.5.3

Description

In Python 3.11, describe_dataset_in_correlated_attribute_mode raises ValueError. And in Python 3.10, the same code with the same versions of dependencies works correctly.

At the same time, describe_dataset_in_independent_attribute_mode and describe_dataset_in_random_mode work correctly in Python 3.11.

Pandas version is 1.5.3, and not the latest 2.0.3, as describe_dataset_in_correlated_attribute_mode additionally doesn't work with Pandas 2.0.3 (I will write a separate issue on that later).

What I Did

from DataSynthesizer.DataDescriber import DataDescriber

describer = DataDescriber()
describer.describe_dataset_in_correlated_attribute_mode(dataset_file=input_data, k=2, epsilon=0)
describer.save_dataset_description_to_file(description_file)

When the code is ran, following happens:

  1. "================ Constructing Bayesian Network (BN) ================" is printed (at least in Jupyter Notebook)
  2. Following exception is raised: "ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

Traceback:

ValueError                                Traceback (most recent call last)
Cell In[22], line 8
      6 describer = DataDescriber()
      7 #TODO k parameter
----> 8 describer.describe_dataset_in_correlated_attribute_mode(dataset_file=input_data,
      9                                                         k=2,
     10                                                         epsilon=0)
     11                                                         #seed=random_state,
     12                                                         #attribute_to_is_categorical=categorical_attributes)
     13 describer.save_dataset_description_to_file(description_file)

File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\DataSynthesizer\DataDescriber.py:177, in DataDescriber.describe_dataset_in_correlated_attribute_mode(self, dataset_file, k, epsilon, attribute_to_datatype, attribute_to_is_categorical, attribute_to_is_candidate_key, categorical_attribute_domain_file, numerical_attribute_ranges, seed)
    174 if self.df_encoded.shape[1] < 2:
    175     raise Exception("Correlated Attribute Mode requires at least 2 attributes(i.e., columns) in dataset.")
--> 177 self.bayesian_network = greedy_bayes(self.df_encoded, k, epsilon / 2, seed=seed)
    178 self.data_description['bayesian_network'] = self.bayesian_network
    179 self.data_description['conditional_probabilities'] = construct_noisy_conditional_distributions(
    180     self.bayesian_network, self.df_encoded, epsilon / 2)

File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\DataSynthesizer\lib\PrivBayes.py:145, in greedy_bayes(dataset, k, epsilon, seed)
    142 attr_to_is_binary = {attr: dataset[attr].unique().size <= 2 for attr in dataset}
    144 print('================ Constructing Bayesian Network (BN) ================')
--> 145 root_attribute = random.choice(dataset.columns)
    146 V = [root_attribute]
    147 rest_attributes = list(dataset.columns)

File C:\Python311\Lib\random.py:369, in Random.choice(self, seq)
    367 def choice(self, seq):
    368     """Choose a random element from a non-empty sequence."""
--> 369     if not seq:
    370         raise IndexError('Cannot choose from an empty sequence')
    371     return seq[self._randbelow(len(seq))]

File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\pandas\core\indexes\base.py:3188, in Index.__nonzero__(self)
   3186 @final
   3187 def __nonzero__(self) -> NoReturn:
-> 3188     raise ValueError(
   3189         f"The truth value of a {type(self).__name__} is ambiguous. "
   3190         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   3191     )

ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions