Skip to content

#1660 added numeric fields normalization for supp datasets#1676

Merged
SFJohnson24 merged 4 commits into
mainfrom
1660-merge-by-numeric
Apr 8, 2026
Merged

#1660 added numeric fields normalization for supp datasets#1676
SFJohnson24 merged 4 commits into
mainfrom
1660-merge-by-numeric

Conversation

@alexfurmenkov
Copy link
Copy Markdown
Collaborator

No description provided.

@alexfurmenkov alexfurmenkov linked an issue Mar 27, 2026 that may be closed by this pull request
@alexfurmenkov alexfurmenkov marked this pull request as ready for review March 27, 2026 14:46
@alexfurmenkov alexfurmenkov marked this pull request as draft March 27, 2026 14:46
return series.astype(float, errors="ignore").astype(str)

@staticmethod
def normalize_numeric_key(series: pd.Series) -> pd.Series:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In connected ticket @SFJohnson24 recommended moving this function in utils so both check operators and preprocessing logic can access the same function. Could you please update the code accordingly?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep both functions? since normalize_numeric_key applied to series and custom_str_conversion used per value? and use cases are a bit different...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SFJohnson24 what is your recommendation for this please?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still thinking moving custom_str_conversion to utils and utilizing that makes the most sense. Custom_str_conversion already handles the exact problem this feature describes--normalizing numeric string representations before comparison. I think we should harmonize the 2 as a single, reusable utils function instead of introducing a duplicate implementation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SFJohnson24 The PR looks good to me now. Could you please confirm if your requested change is fully addressed. I will do a final validaiton and approve after that.

Copy link
Copy Markdown
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly meets AC--moves string conversion to general util; does numeric check--merges on string key and restores original datatype to key column afterwards.

@SFJohnson24 SFJohnson24 merged commit 631e708 into main Apr 8, 2026
12 checks passed
@SFJohnson24 SFJohnson24 deleted the 1660-merge-by-numeric branch April 8, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SUPP dataset merge by numeric variable

3 participants