Skip to content

702: Add regex support for target sorting in target_is_sorted_by operator#1705

Open
alexfurmenkov wants to merge 14 commits intomainfrom
702-target-is-sorted-by-regex
Open

702: Add regex support for target sorting in target_is_sorted_by operator#1705
alexfurmenkov wants to merge 14 commits intomainfrom
702-target-is-sorted-by-regex

Conversation

@alexfurmenkov
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

@alexfurmenkov alexfurmenkov changed the title Add regex support for target sorting in target_is_sorted_by operator 702: Add regex support for target sorting in target_is_sorted_by operator Apr 23, 2026
@alexfurmenkov alexfurmenkov linked an issue Apr 23, 2026 that may be closed by this pull request
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated schema has not been merged with markdown descriptions. Please run the "Merge Schema with Markdown Descriptions" workflow to update the merged schema files.

target_for_sorting = f"{target}_extracted"
# Sort by within columns only, preserve original order within groups
sorted_df = working_df.sort_values(
by=within_columns,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we would need to sort by within_columns and extracted target not preserve the order for regex branch.

"markdownDescription": "\nTrue if the values in name are ordered according to the values specified by value\nin ascending/descending order, grouped by the values in within. Each value entry\nrequires a variable name, a sort_order of asc or desc, and an optional\nnull_position of first or last (defaults to last) which controls where null/empty\ncomparator values are placed in the expected ordering. Within accepts either a\nsingle column or an ordered list of columns. Columns can be either number or Char\nDates in ISO8601 YYYY-MM-DD format. Date value(s) with different precisions that\noverlap (e.g. 2005-10, 2005-10-3 and 2005-10-08) are all flagged as not sorted as\ntheir order cannot be inferred.\n\nOptionally supports a `regex` parameter that extracts a portion of the target\nvalue for sorting. The regex must contain at least one capturing group. The first\ncaptured group is extracted and converted to numeric if possible, allowing proper\nsorting of sequence numbers (e.g., \"MIDS1\", \"MIDS2\", ..., \"MIDS10\" with regex\n`.*?(\\\\d+)$`). This is particularly useful for variables that end with sequence\nnumbers that may or may not be zero-padded.\n\n```yaml\nCheck:\n all:\n - name: --SEQ\n within:\n - USUBJID\n - MIDSTYPE\n operator: target_is_sorted_by\n value:\n - name: --STDTC\n sort_order: asc\n null_position: last\n```\n\nExample with regex for extracting sequence numbers:\n\n```yaml\nCheck:\n all:\n - name: MIDS\n operator: target_is_sorted_by\n regex: \".*?(\\\\d+)$\" # Extract trailing digits, convert to numeric\n value:\n - name: SMSTDTC\n sort_order: asc\n within:\n - USUBJID\n - MIDSTYPE\n```\n"
}
},
"required": ["operator", "value", "within"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the new regex property here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not very visible here, but I actually added information about the regex. It’s easier to view it in the editor.

Copy link
Copy Markdown
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I executed the rule CG0546 in dev editor. I used the dataset from folder CG0545 in sharepiont as there was no dataset for CG0546. I made a change and added the suffix to the MIDSTYPE column records in the SM dataset. The updated dataset.is attached.

I get the following error

{
  "SM": [
    {
      "executionStatus": "execution error",
      "dataset": "SM",
      "domain": "SM",
      "variables": [],
      "message": "rule evaluation error - operation failed",
      "errors": [
        {
          "dataset": "SM",
          "error": "Error occurred during operation execution",
          "message": "Failed to execute rule operation. Operation: record_count, Target: None, Domain: SM, Error: single positional indexer is out-of-bounds"
        }
      ]
    }
  ],
  "TM": [
    {
      "executionStatus": "skipped",
      "dataset": "TM",
      "domain": "TM",
      "variables": [],
      "message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM",
      "errors": [
        {
          "dataset": "TM",
          "error": "Outside scope",
          "message": "Rule skipped - doesn't apply to domain for rule id=CDISC.SDTMIG.CG0546, dataset=TM"
        }
      ]
    }
  ]
}

unit-test-coreid-CG0545-negative 1.xlsx

Please let me know if I updated the dataset incorrectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rule blocked: CORERULES-246

3 participants