-
Notifications
You must be signed in to change notification settings - Fork 27
1017 ddf00102 #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1017 ddf00102 #1469
Conversation
…ed distinct operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two updates. i will leave it to richard to confirm things are working for him
| non_null_values = [] | ||
| for val in column: | ||
| if val is None: | ||
| continue | ||
| try: | ||
| if pd.isna(val): | ||
| continue | ||
| except (ValueError, TypeError): | ||
| pass | ||
| non_null_values.append(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe you can replace this block with:
non_null_values = column[column.notnull()]
it will be useful to take a tutorial to learn some pandas fundamentals
| @pytest.mark.parametrize( | ||
| "column_data,expected", | ||
| [ | ||
| ([["A", "B"], ["C", "D"], ["E", "F"]], True), | ||
| ([{"A", "B"}, {"C", "D"}], True), | ||
| ([None, ["A", "B"], ["C", "D"]], True), | ||
| ([["A", "B"], None, ["C", "D"]], True), | ||
| ([["A", "B"], ["C", "D"], None], True), | ||
| ([None, None, ["A", "B"]], True), | ||
| ([None, {"A", "B"}, {"C", "D"}], True), | ||
| ([[]], True), | ||
| ([set()], True), | ||
| ([None, []], True), | ||
| ([None, set()], True), | ||
| ([["A"], []], True), | ||
| ([["A"], set()], True), | ||
| ([{"A"}, []], True), | ||
| ([["A"]], True), | ||
| ([{"A"}], True), | ||
| ([None, ["A"]], True), | ||
| ([["A"], {"B"}], True), | ||
| ([None, ["A"], {"B"}], True), | ||
| ([float("nan")], False), | ||
| ([None, float("nan")], False), | ||
| ([None, float("nan"), ["A"]], True), | ||
| ([float("nan"), ["A"]], True), | ||
| # Negative cases - not iterables | ||
| (["A", "B", "C"], False), | ||
| ([None, "A", "B"], False), | ||
| ([["A", "B"], "C", ["D", "E"]], False), | ||
| ([None, None, None], False), | ||
| ([], False), | ||
| ([("A", "B")], False), | ||
| ([None, ("A", "B")], False), | ||
| ([["A"], ("B", "C")], False), | ||
| ], | ||
| ) | ||
| def test_is_column_of_iterables(column_data, expected): | ||
| df = PandasDataset.from_dict({"col": column_data}) | ||
| dataframe_operator = DataframeType({"value": df}) | ||
| result = dataframe_operator.is_column_of_iterables(df["col"]) | ||
| assert result == expected | ||
|
|
||
|
|
||
| def test_is_column_of_iterables_with_pd_na(): | ||
| """Test pd.NA handling (pandas NA value)""" | ||
| import pandas as pd | ||
|
|
||
| test_cases = [ | ||
| ([pd.NA], False), | ||
| ([None, pd.NA], False), | ||
| ([None, pd.NA, ["A"]], True), | ||
| ([pd.NA, ["A"]], True), | ||
| ([None, pd.NA, ["A"], ["B"]], True), | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more thing. I think these test cases should be combined to a single function
ASL-rmarshall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General code review looks good. Validation produced expected results.
…test cases to a single function
Fixes Issue #1017 (CORERULES-9645): Updated
is_column_of_iterablesto check all non-null values instead of only the first row. When the first row isNone(common after grouped distinct operations with LEFT joins), the method now correctly identifies columns containing iterables by examining all non-null values. Also fixed boolean column reading in Excel files by using nullable boolean dtype ("boolean") and addingtrue_values/false_valuesparameters to handleNaNvalues. This fix ensures containment operators work correctly with columns containingNonevalues and affects USDM rules using grouped distinct operations with containment operators.