Skip to content

Conversation

@SFJohnson24
Copy link
Collaborator

@SFJohnson24 SFJohnson24 commented Oct 27, 2025

This pull request introduces support for using reference values in the Distinct operation, allowing the operation to treat the value in the target column as a reference to another column in the same row. This is controlled via a new value_is_reference boolean parameter, which is now supported throughout the operation pipeline. The changes also include updates to the schema, new tests for this functionality, and a minor improvement to domain matching logic.
Datasets.json
Rule_underscores.json
this is one of CG0370 sub-rules using this logic and negative data for it.

Key changes:

Distinct Operation Reference Value Support:

  • Added a new value_is_reference boolean parameter to OperationParams, and updated the Distinct operation logic to use the value in the target column as a reference to another column when this flag is set. This includes support for both grouped and ungrouped distinct operations. [1] [2]
  • Updated the JSON schema (Operations.json) to include the new value_is_reference parameter.
  • Updated the rule processor to pass the value_is_reference parameter when constructing operation parameters.

Testing:

  • Added unit tests to verify the new reference value behavior in both grouped and ungrouped contexts for the Distinct operation, covering both Pandas and Dask datasets.

Domain Matching Logic:

  • Improved domain lookup logic in the rule processor to support matching domains with a trailing double dash (--) to all domains with a common prefix.

@SFJohnson24 SFJohnson24 self-assigned this Oct 27, 2025
@SFJohnson24 SFJohnson24 marked this pull request as ready for review October 28, 2025 17:44
@SFJohnson24 SFJohnson24 changed the title initial cg0370 Oct 28, 2025
Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update Operations.md file for example with value_is_reference too please.

@SFJohnson24
Copy link
Collaborator Author

updated docs @RamilCDISC

Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds support for using reference to distinct operation. The PR was validated by:

  1. Reviewing the updated code for any unwanted code or comments.
  2. Reviewing the updated logic in accordance with the Issue description.
  3. Ensuring all relevant tests are updated and new tests are added.
  4. Ensuring all related documentation and schema files are updated.
  5. Ensuring new tests are added if necessary.
  6. Validating the implementation using dev editor running cg0370 against negative datasets.
  7. Validating the implementation using dev editor running cg0370 against positive datasets.
  8. Testing the edge cases of missing column and null values.

@RamilCDISC RamilCDISC merged commit 0e28e3d into main Oct 31, 2025
11 checks passed
@RamilCDISC RamilCDISC deleted the distinct branch October 31, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants