1681: added @cached decorator to get_dataset#1687
Conversation
| self.dataset_path = original_path | ||
|
|
||
| @cached("get_dataset") | ||
| def get_dataset(self, **kwargs): |
There was a problem hiding this comment.
The get dataset still accepts kwargs which are then passed to concat_split_dataset function. Either pass kwargs to cached instance or could we please have a test to confirm kwargs cannot affect returned dataset
There was a problem hiding this comment.
Just to clarify - should kwargs affect the resulting dataset in this case?
Currently, kwargs are passed to concat_split_datasets and used in DatasetBuilder.build function, but they are not included in the cache key (the decorator only considers args). I've added test to check that kwargs are ignored.
I've noticed, that we can't pass any kwargs to this function -- it will cause issues later in JsonSchemaCheckDatasetBuilder -- it uses same kwargs for dataset_implementation.from_records(filtered, **kwargs)
There was a problem hiding this comment.
drop_duplicates is an artifact and It is no longer used at all-- @alexfurmenkov can you delete drop_duplicates from the concat_split_datasets logic (as well as all the calls to get get_variables_metadata() as an aside)
we can remove kwargs from get_dataset/concat_split_dataset.
Caching does appear that is will work fine as kwargs is not used--we can eliminate the test showing kwargs does not impact get_dataset as kwargs is not used at all
SFJohnson24
left a comment
There was a problem hiding this comment.
see comment; otherwise looks good
# Conflicts: # cdisc_rules_engine/services/data_services/base_data_service.py
SFJohnson24
left a comment
There was a problem hiding this comment.
this PR correctly resolves the caching issue with get_dataset()'
it also removes artifacts from the test/validate separate command which has been resolved recently through other PRs.
added caching to BaseDatasetBuilder.get_dataset function. added new fields to BaseDatasetBuilder to support caching without
added corresponging unit tests and a fixture to clean cache between tests