Multiidvar #1437

SFJohnson24 · 2025-11-14T20:12:51Z

Datasets_single.json
Rule_underscores.json
CG0019_multi.xlsx
CG0019_split_and_supp.xlsx
Datasets.json

this PR adds merge logic for Supp datasets where multiple IDVAR are present. It does so by taking the parent, iterating through the unique IDVAR, grabbing the data for each IDVAR, pivoting the QNAM/QVAL into columns, cleans up the supp metadata columns as well as aggregating multiple rows with the same idvar and dropping duplicate columns. It then uses the left merge logic from single IDVAR supps with the qnam validation loop. Unfortunately, much of the row by row logic is needed to do the merge pivot which Dask is not particularly good at so a conversion to Pandas during this step is needed.

This also removes the old logic of is_relationship that was an artifact before the SDTM Metadata container was created which has been removed. I also updated the tests--they were looking for qnam/qval which was not dropperly being dropped, and the pivot was not properly occuring with some Supp datasets. This resolves that.

RamilCDISC · 2025-11-17T22:02:30Z

cdisc_rules_engine/utilities/data_processor.py

+                ]
+                current_supp = right_dataset.drop(columns=columns_to_drop)
+            if isinstance(left_dataset, DaskDataset):
+                left_pandas = PandasDataset(left_dataset.data.compute())


instead of computing whole dataset can we compute subset of dataset with only required columns? This will help to keep use of memory low for large datasets.

I was able to port single IDVAR logic for dask to just converting the supp dataset to pandas then back to dask for the merge. The supp should be smaller than the parent and manageable. Unfortunately, the way multi-IDVAR pivot merges would need to take place (multiple sequential merges for each idvar while dropping extra columns) doesnt play well with dask lazy evaluation. I kept this scenario as pandas conversion (especially with the potential refactor on the horizon)

RamilCDISC

The PR updates the merge logic for SUPP with multiple IDVAR. The PR was validated by:

Reviewing the PR for any unwanted changes, code or comments.
Reviewing the PR logic in accordance with the AC.
Validating the logic for both pandas and DASK use cases.
Validating all unit tests and regression tests pass.
Validating all related testing has been updated.
Running manual validation using dev rule editor with positive dataset.
Running the manual validations using the dev editor with negative datasets.
Covering edge cases of missing, duplicate, single and multiple values.

SFJohnson24 added 6 commits November 13, 2025 15:01

draft

c8d441b

Merge branch 'main' into multiidvar

3ceffa3

iterative multi idvar bug fix

0da72e7

Merge branch 'main' into multiidvar

1a7a34c

tests

fb42873

cleanup

5ea476d

SFJohnson24 self-assigned this Nov 14, 2025

SFJohnson24 temporarily deployed to DEV November 14, 2025 20:12 — with GitHub Actions Inactive

SFJohnson24 added 2 commits November 14, 2025 15:38

alphabetize requirement.txt

a0f3ada

removed is_relationship flag

dc1a77f

SFJohnson24 temporarily deployed to DEV November 17, 2025 18:01 — with GitHub Actions Inactive

SFJohnson24 added 2 commits November 17, 2025 13:24

tests

e2b12f1

dask

79e33eb

SFJohnson24 temporarily deployed to DEV November 17, 2025 21:14 — with GitHub Actions Inactive

Merge branch 'main' into multiidvar

7b104df

SFJohnson24 temporarily deployed to DEV November 17, 2025 21:15 — with GitHub Actions Inactive

SFJohnson24 marked this pull request as ready for review November 17, 2025 21:20

SFJohnson24 requested a review from RamilCDISC November 17, 2025 21:23

RamilCDISC reviewed Nov 17, 2025

View reviewed changes

SFJohnson24 added 2 commits November 18, 2025 10:07

Merge branch 'main' into multiidvar

e58276d

dask logic

53b5636

SFJohnson24 temporarily deployed to DEV November 18, 2025 20:26 — with GitHub Actions Inactive

RamilCDISC temporarily deployed to DEV November 20, 2025 22:48 — with GitHub Actions Inactive

RamilCDISC temporarily deployed to DEV November 21, 2025 21:31 — with GitHub Actions Inactive

RamilCDISC approved these changes Nov 21, 2025

View reviewed changes

RamilCDISC merged commit d7e80d5 into main Nov 21, 2025
13 checks passed

RamilCDISC deleted the multiidvar branch November 21, 2025 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiidvar #1437

Multiidvar #1437

Uh oh!

SFJohnson24 commented Nov 14, 2025 •

edited

Loading

Uh oh!

RamilCDISC Nov 17, 2025

Uh oh!

SFJohnson24 Nov 18, 2025

Uh oh!

RamilCDISC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Multiidvar #1437

Multiidvar #1437

Uh oh!

Conversation

SFJohnson24 commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RamilCDISC Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SFJohnson24 commented Nov 14, 2025 •

edited

Loading