702: Add regex support for target sorting in target_is_sorted_by operator#1705
702: Add regex support for target sorting in target_is_sorted_by operator#1705alexfurmenkov wants to merge 14 commits intomainfrom
Conversation
…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex
| target_for_sorting = f"{target}_extracted" | ||
| # Sort by within columns only, preserve original order within groups | ||
| sorted_df = working_df.sort_values( | ||
| by=within_columns, |
There was a problem hiding this comment.
here we would need to sort by within_columns and extracted target not preserve the order for regex branch.
| "markdownDescription": "\nTrue if the values in name are ordered according to the values specified by value\nin ascending/descending order, grouped by the values in within. Each value entry\nrequires a variable name, a sort_order of asc or desc, and an optional\nnull_position of first or last (defaults to last) which controls where null/empty\ncomparator values are placed in the expected ordering. Within accepts either a\nsingle column or an ordered list of columns. Columns can be either number or Char\nDates in ISO8601 YYYY-MM-DD format. Date value(s) with different precisions that\noverlap (e.g. 2005-10, 2005-10-3 and 2005-10-08) are all flagged as not sorted as\ntheir order cannot be inferred.\n\nOptionally supports a `regex` parameter that extracts a portion of the target\nvalue for sorting. The regex must contain at least one capturing group. The first\ncaptured group is extracted and converted to numeric if possible, allowing proper\nsorting of sequence numbers (e.g., \"MIDS1\", \"MIDS2\", ..., \"MIDS10\" with regex\n`.*?(\\\\d+)$`). This is particularly useful for variables that end with sequence\nnumbers that may or may not be zero-padded.\n\n```yaml\nCheck:\n all:\n - name: --SEQ\n within:\n - USUBJID\n - MIDSTYPE\n operator: target_is_sorted_by\n value:\n - name: --STDTC\n sort_order: asc\n null_position: last\n```\n\nExample with regex for extracting sequence numbers:\n\n```yaml\nCheck:\n all:\n - name: MIDS\n operator: target_is_sorted_by\n regex: \".*?(\\\\d+)$\" # Extract trailing digits, convert to numeric\n value:\n - name: SMSTDTC\n sort_order: asc\n within:\n - USUBJID\n - MIDSTYPE\n```\n" | ||
| } | ||
| }, | ||
| "required": ["operator", "value", "within"], |
There was a problem hiding this comment.
I think we should add the new regex property here.
There was a problem hiding this comment.
It’s not very visible here, but I actually added information about the regex. It’s easier to view it in the editor.
…racted target values
…isc-org/cdisc-rules-engine into 702-target-is-sorted-by-regex
RamilCDISC
left a comment
There was a problem hiding this comment.
I executed the rule CG0546 in dev editor. I used the dataset from folder CG0545 in sharepiont as there was no dataset for CG0546. I made a change and added the suffix to the MIDSTYPE column records in the SM dataset. The updated dataset.is attached.
I get the following error
{
"SM": [
{
"executionStatus": "execution error",
"dataset": "SM",
"domain": "SM",
"variables": [],
"message": "rule evaluation error - operation failed",
"errors": [
{
"dataset": "SM",
"error": "Error occurred during operation execution",
"message": "Failed to execute rule operation. Operation: record_count, Target: None, Domain: SM, Error: single positional indexer is out-of-bounds"
}
]
}
],
"TM": [
{
"executionStatus": "skipped",
"dataset": "TM",
"domain": "TM",
"variables": [],
"message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM",
"errors": [
{
"dataset": "TM",
"error": "Outside scope",
"message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM"
}
]
}
]
}
unit-test-coreid-CG0545-negative 1.xlsx
Please let me know if I updated the dataset incorrectly.
No description provided.