Skip to content

fix codelist operations type inference for empty dataframes#1709

Merged
gerrycampion merged 5 commits into
mainfrom
1651-new-version-of-core-engine-causes-errors-in-usdm-rule-execution
Apr 28, 2026
Merged

fix codelist operations type inference for empty dataframes#1709
gerrycampion merged 5 commits into
mainfrom
1651-new-version-of-core-engine-causes-errors-in-usdm-rule-execution

Conversation

@gerrycampion
Copy link
Copy Markdown
Collaborator

Fixes the issue for CORE-000857 with #1651. I cannot reproduce the issue for CORE-000873, CORE-000874 without the proper data.

Previously, we would get an Execution Error because the operations input dataframe is empty (after match datasets) and the column datatypes cannot be inferred.
Broken.xlsx
Now we get a skip.
Fixed.xlsx

Ran with following data and launch config:
pilot_LZZT_narrative_2026MAR10.json

{
  "name": "USDM Rule",
  "type": "debugpy",
  "request": "launch",
  "program": "${workspaceFolder}/core.py",
  "console": "integratedTerminal",
  "args": [
    "validate",
    "-s",
    "usdm",
    "-v",
    "4-0",
    "-dp",
    "pilot_LZZT_narrative_2026MAR10.json",
    "-r",
    "CORE-000857",
    "-of",
    "XLSX",
    "-of",
    "json",
    "-l",
    "debug"
  ]
}

Copy link
Copy Markdown
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add a unit or regression test for this case?

@gerrycampion
Copy link
Copy Markdown
Collaborator Author

Would it be better to add a unit or regression test for this case?

Yes, I added unit tests that fail on main but pass on this branch

self.params.ct_package_type, unique_ct_versions
)
ct_df = self.evaluation_dataset.__class__.from_dict(ct_data)
ct_df = ct_df.astype(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR only converts the ct_df to string but evaluation dataset is still not made safe for merge. if it has empty columns the merge can fail. I have noticed that in the test the evaluation dataset was made safe manually by converting it to string. That let me to think that we need to make the evaluation dataset safe here too to merge without errors.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe that the fix for the types allowed us to have empty evaluation datasets as well. I've removed the conversions from the tests.

ct_df = self.evaluation_dataset.__class__.from_dict(ct_data)
ct_df = ct_df.astype(
{
"version": str,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using str to covnert to string here can convert missing values like nan to "nan" which could create issue in downstream flow. One approach can be to convert like:
`{

   "version": "string",
   "codelist_code": "string"

}
`

ct_df = self.evaluation_dataset.__class__.from_dict(ct_data)
ct_df = ct_df.astype(
{
"version": str,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here too

Copy link
Copy Markdown
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR fixes the bug for empty dataframe in codelist operations. The PR was validated by:

  1. Reviewing the PR for any unwanted code or comments.
  2. Reviewing the PR in accordance with the AC.
  3. Reviewing the updated tests for logic and coverage.
  4. Reviewing the code for maintaining quality and downstream safe checks.
  5. Running validation using positive dataset in dev editor.
  6. Running validation using negative dataset in dev editor.
  7. Running validation using edge case of empty dataframe.
  8. Ensuring all unit and integration testing pass.

@gerrycampion gerrycampion merged commit 4d89e6d into main Apr 28, 2026
13 of 14 checks passed
@gerrycampion gerrycampion deleted the 1651-new-version-of-core-engine-causes-errors-in-usdm-rule-execution branch April 28, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New version of Core engine causes errors in USDM rule execution

2 participants