-
Notifications
You must be signed in to change notification settings - Fork 89
Open
Description
- DataSynthesizer version: 0.1.11 (latest)
- Python version: 3.11
- Operating System: Windows
- Pandas version: 1.5.3
Description
In Python 3.11, describe_dataset_in_correlated_attribute_mode raises ValueError. And in Python 3.10, the same code with the same versions of dependencies works correctly.
At the same time, describe_dataset_in_independent_attribute_mode and describe_dataset_in_random_mode work correctly in Python 3.11.
Pandas version is 1.5.3, and not the latest 2.0.3, as describe_dataset_in_correlated_attribute_mode additionally doesn't work with Pandas 2.0.3 (I will write a separate issue on that later).
What I Did
from DataSynthesizer.DataDescriber import DataDescriber
describer = DataDescriber()
describer.describe_dataset_in_correlated_attribute_mode(dataset_file=input_data, k=2, epsilon=0)
describer.save_dataset_description_to_file(description_file)When the code is ran, following happens:
- "================ Constructing Bayesian Network (BN) ================" is printed (at least in Jupyter Notebook)
- Following exception is raised: "ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
Traceback:
ValueError Traceback (most recent call last)
Cell In[22], line 8
6 describer = DataDescriber()
7 #TODO k parameter
----> 8 describer.describe_dataset_in_correlated_attribute_mode(dataset_file=input_data,
9 k=2,
10 epsilon=0)
11 #seed=random_state,
12 #attribute_to_is_categorical=categorical_attributes)
13 describer.save_dataset_description_to_file(description_file)
File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\DataSynthesizer\DataDescriber.py:177, in DataDescriber.describe_dataset_in_correlated_attribute_mode(self, dataset_file, k, epsilon, attribute_to_datatype, attribute_to_is_categorical, attribute_to_is_candidate_key, categorical_attribute_domain_file, numerical_attribute_ranges, seed)
174 if self.df_encoded.shape[1] < 2:
175 raise Exception("Correlated Attribute Mode requires at least 2 attributes(i.e., columns) in dataset.")
--> 177 self.bayesian_network = greedy_bayes(self.df_encoded, k, epsilon / 2, seed=seed)
178 self.data_description['bayesian_network'] = self.bayesian_network
179 self.data_description['conditional_probabilities'] = construct_noisy_conditional_distributions(
180 self.bayesian_network, self.df_encoded, epsilon / 2)
File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\DataSynthesizer\lib\PrivBayes.py:145, in greedy_bayes(dataset, k, epsilon, seed)
142 attr_to_is_binary = {attr: dataset[attr].unique().size <= 2 for attr in dataset}
144 print('================ Constructing Bayesian Network (BN) ================')
--> 145 root_attribute = random.choice(dataset.columns)
146 V = [root_attribute]
147 rest_attributes = list(dataset.columns)
File C:\Python311\Lib\random.py:369, in Random.choice(self, seq)
367 def choice(self, seq):
368 """Choose a random element from a non-empty sequence."""
--> 369 if not seq:
370 raise IndexError('Cannot choose from an empty sequence')
371 return seq[self._randbelow(len(seq))]
File ~\.virtualenvs\DataSynthesizerTest311\Lib\site-packages\pandas\core\indexes\base.py:3188, in Index.__nonzero__(self)
3186 @final
3187 def __nonzero__(self) -> NoReturn:
-> 3188 raise ValueError(
3189 f"The truth value of a {type(self).__name__} is ambiguous. "
3190 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
3191 )
ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().Metadata
Metadata
Assignees
Labels
No labels