Fix get_dataset by gerrycampion · Pull Request #1691 · cdisc-org/cdisc-rules-engine

gerrycampion · 2026-04-10T17:41:36Z

This pull request refactors how dataset references are handled throughout the codebase, replacing the use of file paths (e.g., dataset_path and .xpt filenames) with dataset names and metadata objects. Additionally, the pull request cleans up unused parameters and updates test data to match the new conventions.

Reports have also been updated to use the dataset names instead of filenames when it make sense. This triggers a required test suite update: https://github.com/cdisc-org/CORE_Test_Suite/pull/91

RamilCDISC · 2026-04-19T19:57:02Z

    def _execute_operation(self):
        # get metadata
-        dataframe = self.data_service.get_dataset(dataset_name=self.params.dataset_path)
+        dataframe = self.data_service.get_dataset(dataset_name=self.params.domain)


The get dataset now expects a name of the dataset which only true when domain == metadata.name but for split datasets this is not true. This will not work for split dataset then. Please let me know if I am misunderstanding something.

Great catch. I removed use of self.params.domain where not necessary because we are already creating self.params.dataframe_metadata in the operation params

RamilCDISC · 2026-04-21T19:08:30Z

            dataset_name=dataset_name, **params
        )
        metadata_to_return: dict = {
            "dataset_size": [dataset_metadata.file_size],


I see a possible regression here. The LocalDataService.get_raw_dataset_metadata() rebuilds metadata per call and can apply size unit. In this PR that override is removed and metadata is initialized upfront.

So now the dataset_metadata.file_size comes from a stored object so calls like get_dataset_metadata(...size_unit=size_unit) in metadata builders will not have proper size unit and will return only the stored value. Please let me know if I am confusing it.

It's a good point, except that size_unit is a dead variable that wasn't/isn't being used in get_dataset_metadata. Can you create a new issue to fix or remove the use of size_unit?

RamilCDISC

The PR fixes the get_dataset() function calls from different dataset builder to remove the reported bug in connected ticket. The validation was done by:

Reviewing the PR for any unwanted code or comments.
Reviewing the PR in accordance to AC.
Reviewing the PR updated tests for logic and coverage.
Reviewing the updated builders for consistency.
Reviewing the updated builders to ensure they preserve intended functionality.
Ensuring all relevant builders are updated.
Ensuring all unit and integration testing pass.
Ensuring proper execution of negative and positive dataset using dev editor.
Ensuring execution of all dataset builders to confirm functionality.

SFJohnson24

this looks great. The PR fixes get_dataset() calls across dataset builders to resolve the bug in the connected ticket, including refactoring build_split_datasets to swap self.dataset_metadata via get_raw_dataset_metadata() and simplifying the get_dataset_class signature. All relevant builders were reviewed for consistency, correctness, and preserved functionality. Tests were updated as well as cert data run to confirm preservation and consistency

before test fixes

c516cdf

gerrycampion linked an issue Apr 10, 2026 that may be closed by this pull request

get_dataset() incorrect calls #1672

Closed

Merge branch 'main' into 1672-get_dataset-incorrect-calls

5cabd75

gerrycampion temporarily deployed to DEV April 11, 2026 02:01 — with GitHub Actions Inactive

Fixed unit tests

ead3ac8

gerrycampion temporarily deployed to DEV April 14, 2026 04:53 — with GitHub Actions Inactive

more test fixes

4b5a081

gerrycampion temporarily deployed to DEV April 14, 2026 17:54 — with GitHub Actions Inactive

test suite fixes

16bee96

gerrycampion temporarily deployed to DEV April 14, 2026 18:38 — with GitHub Actions Inactive

simplify dummy data service

c02661f

gerrycampion temporarily deployed to DEV April 15, 2026 16:31 — with GitHub Actions Inactive

fixed dataset name in reports

a4f337c

gerrycampion temporarily deployed to DEV April 15, 2026 20:41 — with GitHub Actions Inactive

regression report fixes

bc78160

gerrycampion temporarily deployed to DEV April 15, 2026 21:36 — with GitHub Actions Inactive

fix rule editor test

b6ae02c

gerrycampion temporarily deployed to DEV April 15, 2026 21:51 — with GitHub Actions Inactive

removed more dataset path

039f678

gerrycampion temporarily deployed to DEV April 15, 2026 22:09 — with GitHub Actions Inactive

remove unused method

85eef27

gerrycampion temporarily deployed to DEV April 15, 2026 22:54 — with GitHub Actions Inactive

remove unnecessary dataset path params

9b836b1

gerrycampion temporarily deployed to DEV April 16, 2026 00:29 — with GitHub Actions Inactive

missed a dataset_path

6da3024

gerrycampion temporarily deployed to DEV April 16, 2026 13:42 — with GitHub Actions Inactive

gerrycampion added 2 commits April 16, 2026 22:13

remove extra datasets references

5429014

Merge branch 'main' into 1672-get_dataset-incorrect-calls

4de4e0c

gerrycampion temporarily deployed to DEV April 17, 2026 02:23 — with GitHub Actions Inactive

fix merged test code

befd6b9

gerrycampion marked this pull request as ready for review April 17, 2026 02:55

gerrycampion marked this pull request as draft April 17, 2026 04:36

refactor operation params

d2b13d9

gerrycampion temporarily deployed to DEV April 17, 2026 17:30 — with GitHub Actions Inactive

gerrycampion marked this pull request as ready for review April 17, 2026 17:48

gerrycampion mentioned this pull request Apr 17, 2026

Rules 2 cdisc-org/cdisc-open-rules#15

Merged

RamilCDISC reviewed Apr 19, 2026

View reviewed changes

removed unneeded self.params.domain from operations

a494794

gerrycampion temporarily deployed to DEV April 21, 2026 02:32 — with GitHub Actions Inactive

gerrycampion requested a review from RamilCDISC April 21, 2026 02:35

Merge branch 'main' into 1672-get_dataset-incorrect-calls

02a2457

gerrycampion temporarily deployed to DEV April 21, 2026 16:33 — with GitHub Actions Inactive

RamilCDISC reviewed Apr 21, 2026

View reviewed changes

more fixes for operations dataset metadata source

a61a74d

gerrycampion temporarily deployed to DEV April 21, 2026 22:53 — with GitHub Actions Inactive

Merge branch 'main' into 1672-get_dataset-incorrect-calls

67188e6

gerrycampion temporarily deployed to DEV April 21, 2026 23:01 — with GitHub Actions Inactive

fixed test_contents_library_variables_dataset_builder

6285fd3

gerrycampion temporarily deployed to DEV April 21, 2026 23:13 — with GitHub Actions Inactive

RamilCDISC mentioned this pull request Apr 22, 2026

Remove size_unit as it is a dead variable that isn't being used in get_dataset_metadata for LocalDataService. #1703

Closed

Merge branch 'main' into 1672-get_dataset-incorrect-calls

cb83194

gerrycampion temporarily deployed to DEV April 27, 2026 04:25 — with GitHub Actions Inactive

Merge branch 'main' into 1672-get_dataset-incorrect-calls

3bf3810

RamilCDISC temporarily deployed to DEV April 27, 2026 22:26 — with GitHub Actions Inactive

RamilCDISC approved these changes Apr 27, 2026

View reviewed changes

SFJohnson24 approved these changes Apr 28, 2026

View reviewed changes

Merge branch 'main' into 1672-get_dataset-incorrect-calls

6bfae7a

SFJohnson24 temporarily deployed to DEV April 28, 2026 18:56 — with GitHub Actions Inactive

SFJohnson24 merged commit 2159870 into main Apr 28, 2026
5 of 8 checks passed

SFJohnson24 deleted the 1672-get_dataset-incorrect-calls branch April 28, 2026 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix get_dataset#1691

Fix get_dataset#1691
SFJohnson24 merged 25 commits into
mainfrom
1672-get_dataset-incorrect-calls

gerrycampion commented Apr 10, 2026 •

edited

Loading

Uh oh!

RamilCDISC Apr 19, 2026

Uh oh!

gerrycampion Apr 21, 2026

Uh oh!

RamilCDISC Apr 21, 2026

Uh oh!

gerrycampion Apr 21, 2026

Uh oh!

RamilCDISC left a comment

Uh oh!

SFJohnson24 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gerrycampion commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RamilCDISC Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

gerrycampion Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gerrycampion Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gerrycampion commented Apr 10, 2026 •

edited

Loading