fix codelist operations type inference for empty dataframes#1709
Conversation
RamilCDISC
left a comment
There was a problem hiding this comment.
Would it be better to add a unit or regression test for this case?
Yes, I added unit tests that fail on main but pass on this branch |
| self.params.ct_package_type, unique_ct_versions | ||
| ) | ||
| ct_df = self.evaluation_dataset.__class__.from_dict(ct_data) | ||
| ct_df = ct_df.astype( |
There was a problem hiding this comment.
The PR only converts the ct_df to string but evaluation dataset is still not made safe for merge. if it has empty columns the merge can fail. I have noticed that in the test the evaluation dataset was made safe manually by converting it to string. That let me to think that we need to make the evaluation dataset safe here too to merge without errors.
There was a problem hiding this comment.
i believe that the fix for the types allowed us to have empty evaluation datasets as well. I've removed the conversions from the tests.
| ct_df = self.evaluation_dataset.__class__.from_dict(ct_data) | ||
| ct_df = ct_df.astype( | ||
| { | ||
| "version": str, |
There was a problem hiding this comment.
using str to covnert to string here can convert missing values like nan to "nan" which could create issue in downstream flow. One approach can be to convert like:
`{
"version": "string",
"codelist_code": "string"
}
`
| ct_df = self.evaluation_dataset.__class__.from_dict(ct_data) | ||
| ct_df = ct_df.astype( | ||
| { | ||
| "version": str, |
…s-in-usdm-rule-execution
RamilCDISC
left a comment
There was a problem hiding this comment.
The PR fixes the bug for empty dataframe in codelist operations. The PR was validated by:
- Reviewing the PR for any unwanted code or comments.
- Reviewing the PR in accordance with the AC.
- Reviewing the updated tests for logic and coverage.
- Reviewing the code for maintaining quality and downstream safe checks.
- Running validation using positive dataset in dev editor.
- Running validation using negative dataset in dev editor.
- Running validation using edge case of empty dataframe.
- Ensuring all unit and integration testing pass.
Fixes the issue for CORE-000857 with #1651. I cannot reproduce the issue for CORE-000873, CORE-000874 without the proper data.
Previously, we would get an Execution Error because the operations input dataframe is empty (after match datasets) and the column datatypes cannot be inferred.
Broken.xlsx
Now we get a skip.
Fixed.xlsx
Ran with following data and launch config:
pilot_LZZT_narrative_2026MAR10.json
{ "name": "USDM Rule", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/core.py", "console": "integratedTerminal", "args": [ "validate", "-s", "usdm", "-v", "4-0", "-dp", "pilot_LZZT_narrative_2026MAR10.json", "-r", "CORE-000857", "-of", "XLSX", "-of", "json", "-l", "debug" ] }