691: Tests for CoW behavior in pandas by alexfurmenkov · Pull Request #1690 · cdisc-org/cdisc-rules-engine

alexfurmenkov · 2026-04-10T10:55:53Z

No description provided.

RamilCDISC · 2026-04-24T20:34:14Z

+        # Adding DaskDataset will cause downstream issues since Dask does not support copy-on-write.
+        if type(cached) is PandasDataset:
+            cached.data = cached.data.copy(deep=False)
+        return cached


Here the same cached wrapper is returned. Pandas CoW protects separate pandas objects sharing same underlying data. Here the change to cached.data mutates the wrapper but in end returns the same object.

RamilCDISC · 2026-04-24T20:37:03Z

This test only checks the local CacheService created in this test not the production InMemoryCacheService or PandasDataset. The changes will not be tested properly.

RamilCDISC · 2026-04-27T21:59:12Z

+            return PandasDataset(cached.data.copy(deep=False))
+        return cached

    def get_all(self, cache_keys: List[str]):


The other get functions like get-all and get_all_by_prefix etc still return directly which can mutate the cached dataset. We need to cover these too as for get() function.

RamilCDISC · 2026-04-27T22:02:24Z

 from cdisc_rules_engine.models.sdtm_dataset_metadata import SDTMDatasetMetadata
 from cdisc_rules_engine.enums.sensitivity import Sensitivity

+pd.options.mode.copy_on_write = True


This will turn pandas CoW process wide so any one integrating rules engine in their workflow will have this enabled. We should document it by adding to the repository readme.

RamilCDISC · 2026-04-27T22:08:22Z

Could you please add a little more testing that includes operations which will affect the length of the dataset for example add drop rows. The engine will perform these operations on the dataset. Confirming proper caching will be helpful.

…dataset() methods

RamilCDISC · 2026-04-28T18:46:23Z


    def filter_cache(self, prefix: str) -> dict:
-        return {k: self.cache[k] for k in self.cache.keys() if k.startswith(prefix)}
+        return {k: self.cache.get(k) for k in self.cache.keys() if k.startswith(prefix)}


This will still return raw cached object for PandasDataset. Please update this too and I think we will be good to merge.

…move-deepcopy

RamilCDISC

The PR removes the expensive deepcopy() requirements on PandasDataset by utilizing pandas CoW. The validation was done by:

Reviewing the PR for any unwanted code or comments.
Reviewing the PR in accordance with AC.
Reviewing the updated test to confirm coverage and cases.
Ensuring all unit, regression and integration testing pass.
Ensured the pandas CoW is turned on before cache service use in normal Validation path.
Validated that all access to cached dataset will prevent mutation.
Reviewed the added test and requested changes so they cover the operation on dataset that may change the size length of the dataset.
Ensured the CoW behavior is documented because it turns it on process wide.
Validated the Dask path is unaffected.
Validated against any chance of regression.

tests for CoW behavior in pandas

76b1dd1

alexfurmenkov temporarily deployed to DEV April 10, 2026 10:55 — with GitHub Actions Inactive

tested true CoW via shallow copy

12b2dc8

alexfurmenkov temporarily deployed to DEV April 13, 2026 16:36 — with GitHub Actions Inactive

added shallow copying for cached datasets

7a4479d

alexfurmenkov temporarily deployed to DEV April 14, 2026 10:37 — with GitHub Actions Inactive

dask copy workaround

3fe3bb6

alexfurmenkov temporarily deployed to DEV April 15, 2026 12:32 — with GitHub Actions Inactive

alexfurmenkov requested review from RamilCDISC, SFJohnson24 and gerrycampion April 15, 2026 14:44

alexfurmenkov marked this pull request as ready for review April 15, 2026 14:44

alexfurmenkov linked an issue Apr 22, 2026 that may be closed by this pull request

rules_engine.execute_rules does a deepcopy of the dataset #691

Open

alexfurmenkov changed the title ~~tests for CoW behavior in pandas~~ 691: Tests for CoW behavior in pandas Apr 22, 2026

RamilCDISC requested changes Apr 24, 2026

View reviewed changes

fix CoW tests and wrapper

6676707

alexfurmenkov temporarily deployed to DEV April 27, 2026 18:06 — with GitHub Actions Inactive

RamilCDISC requested changes Apr 27, 2026

View reviewed changes

alexfurmenkov requested a review from RamilCDISC April 28, 2026 10:24

Merge branch 'main' into 691-remove-deepcopy

ebaa027

alexfurmenkov temporarily deployed to DEV April 28, 2026 10:24 — with GitHub Actions Inactive

alexfurmenkov added 2 commits April 28, 2026 12:40

added tests for cache methods. changed cache access to get() and get_…

6940e4b

…dataset() methods

readme notice about CoW usage

d06d875

alexfurmenkov temporarily deployed to DEV April 28, 2026 10:46 — with GitHub Actions Inactive

Merge branch 'main' into 691-remove-deepcopy

c33ef02

RamilCDISC temporarily deployed to DEV April 28, 2026 18:39 — with GitHub Actions Inactive

RamilCDISC requested changes Apr 28, 2026

View reviewed changes

alexfurmenkov added 2 commits April 29, 2026 12:55

fix filter_cache access to cache

d14e859

Merge remote-tracking branch 'origin/691-remove-deepcopy' into 691-re…

5d4072e

…move-deepcopy

alexfurmenkov temporarily deployed to DEV April 29, 2026 10:56 — with GitHub Actions Inactive

alexfurmenkov requested a review from RamilCDISC April 29, 2026 13:05

Merge branch 'main' into 691-remove-deepcopy

6ff9399

alexfurmenkov temporarily deployed to DEV April 29, 2026 14:02 — with GitHub Actions Inactive

RamilCDISC approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

691: Tests for CoW behavior in pandas#1690

691: Tests for CoW behavior in pandas#1690
alexfurmenkov wants to merge 12 commits intomainfrom
691-remove-deepcopy

alexfurmenkov commented Apr 10, 2026

Uh oh!

RamilCDISC Apr 24, 2026

Uh oh!

RamilCDISC Apr 24, 2026

Uh oh!

RamilCDISC Apr 27, 2026

Uh oh!

RamilCDISC Apr 27, 2026

Uh oh!

RamilCDISC Apr 27, 2026

Uh oh!

RamilCDISC Apr 28, 2026

Uh oh!

RamilCDISC left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexfurmenkov commented Apr 10, 2026

Uh oh!

RamilCDISC Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants