Skip to content

Conversation

@RakeshBobba03
Copy link
Collaborator

@RakeshBobba03 RakeshBobba03 commented Dec 4, 2025

Fixes Issue #1017 (CORERULES-9645): Updated is_column_of_iterables to check all non-null values instead of only the first row. When the first row is None (common after grouped distinct operations with LEFT joins), the method now correctly identifies columns containing iterables by examining all non-null values. Also fixed boolean column reading in Excel files by using nullable boolean dtype ("boolean") and adding true_values/false_values parameters to handle NaN values. This fix ensures containment operators work correctly with columns containing None values and affects USDM rules using grouped distinct operations with containment operators.

Copy link
Collaborator

@gerrycampion gerrycampion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two updates. i will leave it to richard to confirm things are working for him

Comment on lines 158 to 167
non_null_values = []
for val in column:
if val is None:
continue
try:
if pd.isna(val):
continue
except (ValueError, TypeError):
pass
non_null_values.append(val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe you can replace this block with:
non_null_values = column[column.notnull()]

it will be useful to take a tutorial to learn some pandas fundamentals

Comment on lines 516 to 570
@pytest.mark.parametrize(
"column_data,expected",
[
([["A", "B"], ["C", "D"], ["E", "F"]], True),
([{"A", "B"}, {"C", "D"}], True),
([None, ["A", "B"], ["C", "D"]], True),
([["A", "B"], None, ["C", "D"]], True),
([["A", "B"], ["C", "D"], None], True),
([None, None, ["A", "B"]], True),
([None, {"A", "B"}, {"C", "D"}], True),
([[]], True),
([set()], True),
([None, []], True),
([None, set()], True),
([["A"], []], True),
([["A"], set()], True),
([{"A"}, []], True),
([["A"]], True),
([{"A"}], True),
([None, ["A"]], True),
([["A"], {"B"}], True),
([None, ["A"], {"B"}], True),
([float("nan")], False),
([None, float("nan")], False),
([None, float("nan"), ["A"]], True),
([float("nan"), ["A"]], True),
# Negative cases - not iterables
(["A", "B", "C"], False),
([None, "A", "B"], False),
([["A", "B"], "C", ["D", "E"]], False),
([None, None, None], False),
([], False),
([("A", "B")], False),
([None, ("A", "B")], False),
([["A"], ("B", "C")], False),
],
)
def test_is_column_of_iterables(column_data, expected):
df = PandasDataset.from_dict({"col": column_data})
dataframe_operator = DataframeType({"value": df})
result = dataframe_operator.is_column_of_iterables(df["col"])
assert result == expected


def test_is_column_of_iterables_with_pd_na():
"""Test pd.NA handling (pandas NA value)"""
import pandas as pd

test_cases = [
([pd.NA], False),
([None, pd.NA], False),
([None, pd.NA, ["A"]], True),
([pd.NA, ["A"]], True),
([None, pd.NA, ["A"], ["B"]], True),
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more thing. I think these test cases should be combined to a single function

Copy link
Collaborator

@ASL-rmarshall ASL-rmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General code review looks good. Validation produced expected results.

@gerrycampion gerrycampion merged commit 29bc10b into main Dec 5, 2025
11 checks passed
@gerrycampion gerrycampion deleted the 1017-DDF00102 branch December 5, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CORERULES-9645 - is_column_of_iterables method only checks first row

5 participants