CG0562#1454
Merged
Merged
Conversation
RamilCDISC
requested changes
Dec 2, 2025
| "data, expected, regex", | ||
| [ | ||
| ( | ||
| PandasDataset.from_dict( |
Collaborator
There was a problem hiding this comment.
We have all the test cases here with pandasdataset only. Could you please add some cases using DASK.
RamilCDISC
approved these changes
Dec 5, 2025
Collaborator
RamilCDISC
left a comment
There was a problem hiding this comment.
The PR adds the regex option for record count operation, to select a specific part of a value to compare with. The PR was validated by:
- Validating the PR for any unwanted code or comments.
- Validating the PR logic in context with the AC.
- Ensuring all the unit and regression testing pass.
- Ensuring all related testing is updated.
- Ensuring the updated testing covers cases for both pandas and DASK implementations.
- Running manual testing using dev editor for positive dataset.
- Running manual testing using dev editor for negative dataset.
- Ensuring test cases for the regex matching.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR resolves the issues currrently reported with CG0562. Mainly: the authors need to filter only the date portion and not datetime for the record count. I added regex handling to this operator to support this and the rule below can be found that implements this. I also added logic for grouping based on a wildcard column to appropriately process the -- in the column name
Datasets.json
Datasets.xlsx
Rule_underscores.json
CORE-Report-2025-12-01T15-16-44.xlsx
This pull request adds support for applying regex transformations to grouping columns in the
record_countoperation, allowing users to group records based on extracted patterns (such as dates from datetime strings). The changes include updates to the operation logic, schema, documentation, and comprehensive unit tests to ensure correct behavior for regex-based grouping, including support for grouping aliases and filters.Record Count Operation Enhancements
Added support for a new
regexparameter in therecord_countoperation, enabling transformation of grouping column values using a regex pattern before grouping. This allows, for example, grouping by just the date portion of a datetime string. (cdisc_rules_engine/operations/record_count.py,cdisc_rules_engine/models/operation_params.py,cdisc_rules_engine/utilities/rule_processor.py,resources/schema/Operations.json) [1] [2] [3] [4] [5]Implemented helper methods
_get_grouping_for_operations,_get_regex_grouped_counts, and_apply_regex_to_grouping_columnsinrecord_count.pyto handle regex transformation and grouping logic robustly, including proper handling of grouping aliases and filters. (cdisc_rules_engine/operations/record_count.py) [1] [2]Schema and Documentation Updates
Operations.json) and documentation (Operations.md) to describe the newregexparameter and provide examples of how to use regex-based grouping in YAML operation definitions. (resources/schema/Operations.json,resources/schema/Operations.md) [1] [2] [3]Unit Test Coverage
tests/unit/test_operations/test_record_count.py)Codebase Consistency
cdisc_rules_engine/operations/base_operation.py) [1] [2]These changes collectively make the
record_countoperation more flexible and powerful for data analysis involving grouped record counts with transformed grouping keys.