Skip to content

Conversation

@alexfurmenkov
Copy link
Collaborator

No description provided.

if datasets.data.empty:
datasets.data = ds_metadata.data.copy()
else:
datasets.data = pd.concat(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using pd.concat will return a pandas dataframe. Here dataframe could be either pandas or dask.

…enhance to_parquet method in DummyDataService
@alexfurmenkov alexfurmenkov changed the title fix dataset metadata handling in DatasetMetadataDefineDatasetBuilder 1327: Fix dataset metadata handling in DatasetMetadataDefineDatasetBuilder Nov 19, 2025
@alexfurmenkov alexfurmenkov linked an issue Nov 20, 2025 that may be closed by this pull request
Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR seems fine to me. @gerrycampion if you are satisfied with the updates too then I will put my closing comment with final validation.

metadata_to_return: dict = dataset.get_metadata()
return metadata_to_return

def to_parquet(self, file_path: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexfurmenkov why are you updating the to_parquet method for dummy_data_service? Dummy is for the TestRule Azure Function hosted endpoint for CORE and will never be in parquet format. I was able to run CG0320 just fine without this codeblock?

Copy link
Collaborator Author

@alexfurmenkov alexfurmenkov Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it just doesn't work if ran with DASK

Copy link
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except the change to dummy for to_parquet. If this was done for a reason--please let me know. Otherwise can we please revert it? (We can add a comment there stating something about dummy being for the API endpoint for CORE if you want)
CORE-Report-2025-11-24T15-14-45.xlsx

SFJohnson24
SFJohnson24 approved these changes Nov 25, 2025
Copy link
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see Gerry's comment

…into 1327-fix-dataset-metadata-define-dataset-builder

# Conflicts:
#	tests/unit/test_rules_engine.py
…into 1327-fix-dataset-metadata-define-dataset-builder
Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR fixes the dataset metadata handling. The updates were validated by:

  1. Reviewing the PR for any unwanted code or comments.
  2. Reviewing the PR logic in accordance with the AC.
  3. Ensuring all unit and regression testing pass
  4. Ensuring all relevant testing is updated.
  5. Ensuring compatibility for Pandas and DASK both.
  6. Ensuring previous coding practices are followed.
  7. Running manual testing with positive dataset
  8. Running manual testing with negative dataset.

@SFJohnson24 SFJohnson24 merged commit ba9d0a2 into main Nov 28, 2025
11 checks passed
@SFJohnson24 SFJohnson24 deleted the 1327-fix-dataset-metadata-define-dataset-builder branch November 28, 2025 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC0320 execution error

5 participants