-
Notifications
You must be signed in to change notification settings - Fork 27
1327: Fix dataset metadata handling in DatasetMetadataDefineDatasetBuilder #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1327: Fix dataset metadata handling in DatasetMetadataDefineDatasetBuilder #1440
Conversation
| if datasets.data.empty: | ||
| datasets.data = ds_metadata.data.copy() | ||
| else: | ||
| datasets.data = pd.concat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using pd.concat will return a pandas dataframe. Here dataframe could be either pandas or dask.
…enhance to_parquet method in DummyDataService
RamilCDISC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR seems fine to me. @gerrycampion if you are satisfied with the updates too then I will put my closing comment with final validation.
| metadata_to_return: dict = dataset.get_metadata() | ||
| return metadata_to_return | ||
|
|
||
| def to_parquet(self, file_path: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexfurmenkov why are you updating the to_parquet method for dummy_data_service? Dummy is for the TestRule Azure Function hosted endpoint for CORE and will never be in parquet format. I was able to run CG0320 just fine without this codeblock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise it just doesn't work if ran with DASK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except the change to dummy for to_parquet. If this was done for a reason--please let me know. Otherwise can we please revert it? (We can add a comment there stating something about dummy being for the API endpoint for CORE if you want)
CORE-Report-2025-11-24T15-14-45.xlsx
cdisc_rules_engine/dataset_builders/dataset_metadata_define_dataset_builder.py
Show resolved
Hide resolved
SFJohnson24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see Gerry's comment
…into 1327-fix-dataset-metadata-define-dataset-builder # Conflicts: # tests/unit/test_rules_engine.py
cdisc_rules_engine/dataset_builders/dataset_metadata_define_dataset_builder.py
Outdated
Show resolved
Hide resolved
…into 1327-fix-dataset-metadata-define-dataset-builder
RamilCDISC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR fixes the dataset metadata handling. The updates were validated by:
- Reviewing the PR for any unwanted code or comments.
- Reviewing the PR logic in accordance with the AC.
- Ensuring all unit and regression testing pass
- Ensuring all relevant testing is updated.
- Ensuring compatibility for Pandas and DASK both.
- Ensuring previous coding practices are followed.
- Running manual testing with positive dataset
- Running manual testing with negative dataset.
No description provided.