Skip to content

Conversation

@RakeshBobba03
Copy link
Collaborator

@RakeshBobba03 RakeshBobba03 commented Dec 16, 2025

#1274: Implements explicit inline comparison syntax for output variables in rules via a new compared: dict block within Output Variables, enabling N-way set-based comparisons (len >= 2) where the first variable serves as baseline and subsequent variables are compared against it. The implementation flattens all variables (siblings + compared children) for UI display while isolating comparison logic to only variables within compared blocks, always uses set-based (order-independent) comparison. Reporting now shows formatted comparison summaries (missing/extra items) followed by raw variable lists, with Excel multi-line rendering support.

Attached are the Rule and dataset used for testing:

CORE-000334.yaml
unit-test-coreid-CG0016-negative.xlsx

@RakeshBobba03 RakeshBobba03 linked an issue Dec 16, 2025 that may be closed by this pull request
@RakeshBobba03 RakeshBobba03 marked this pull request as ready for review December 16, 2025 23:20
Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a validation using dev editor and updated the rule to have USUBJID also in output variables like following

Outcome:
  Message: At least one expected variable is missing from dataset
  Output Variables:
    - USUBJID
    - compared:
        - $dataset_variables
        - $expected_variables
    

The AE dataset in attached excel file has USUBJID column but engine returns column not found.


[
  {
    "executionStatus": "success",
    "dataset": "ae.xpt",
    "domain": "AE",
    "variables": [
      "$dataset_variables",
      "$expected_variables",
      "USUBJID"
    ],
    "message": "At least one expected variable is missing from dataset",
    "errors": [
      {
        "value": {
          "$dataset_variables": [
            "STUDYID",
            "DOMAIN",
            "USUBJID",
            "AESEQ",
            "AELNKID",
            "AETERM",
            "AELLT",
            "AELLTCD",
            "AEDECOD",
            "AEPTCD",
            "AEHLT",
            "AEHLTCD",
            "AEHLGT",
            "AEHLGTCD",
            "AEBDSYCD",
            "AESOC",
            "AESOCCD",
            "AESEV",
            "AEACN",
            "AEREL",
            "AEOUT",
            "AESCAN",
            "AESCONG",
            "AESDISAB",
            "AESDTH",
            "AESHOSP",
            "AESLIFE",
            "AESOD",
            "EPOCH",
            "AESTDTC",
            "AEENDTC",
            "AESTDY",
            "AEENDY",
            "AEENRTPT",
            "AEENTPT"
          ],
          "$expected_variables": [
            "AELLT",
            "AELLTCD",
            "AEPTCD",
            "AEHLT",
            "AEHLTCD",
            "AEHLGT",
            "AEHLGTCD",
            "AEBODSYS",
            "AEBDSYCD",
            "AESOC",
            "AESOCCD",
            "AESER",
            "AEACN",
            "AEREL",
            "AESTDTC",
            "AEENDTC"
          ],
          "USUBJID": "Not in dataset"
        },
        "dataset": "ae.xpt"
      }
    ],
    "compare_groups": [
      [
        "$dataset_variables",
        "$expected_variables"
      ]
    ]
  }
]

@RakeshBobba03
Copy link
Collaborator Author

I ran a validation using dev editor and updated the rule to have USUBJID also in output variables

Thanks for bringing this to my attention @RamilCDISC

USUBJID is record-level data, not metadata. When it's included in Output Variables for a Variable Metadata Check rule, the rule operates on a metadata dataset, not the original data records. Since USUBJID is a data column and not part of the metadata structure, it doesn't exist in the metadata dataset. That's why it was showing "Not in dataset".

Sam and I agreed that the message "Not in dataset" is misleading in this context, since USUBJID actually exists in the original dataset, it's just not available in the metadata context for Variable Metadata Check rules. We decided to change the error message to "not available in metadata context" for Variable Metadata Check rules and Dataset Metadata Check rules (and their variants) to better reflect what's happening.

I updated the error message logic in actions.py to return "not available in metadata context" for metadata check rule types instead of "Not in dataset".

and

I updated the documentation in Rule_Type.md to clarify that output variables must match their respective rule types, and that for Variable Metadata Check and Dataset Metadata Check rules, variables not available in the metadata context will display "not available in metadata context" instead of the variable value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comparison in Reporting

4 participants