Tighten `data_batch` by dhruv9vats · Pull Request #120 · NVIDIA/cuCascade

dhruv9vats · 2026-05-08T13:34:54Z

Proposes tightening the API on and around the data_batch.

These proposals are motivated by efforts to:

Ensure correctness
Highlight conflicting requirements
Reduce duplication of code
Separate concerns where possible
Reduce escape hatches

listed approximately in the order of importance given while designing this.

This is an explicit "show me the code" proposal implemented to gather concrete criticism to allow settling on a quality API for the central data_batch construct. There are comments for the reviewer that details why a specific change was made.

…ata_batch

Remove unique repository coverage and update docs for the data_batch accessor model. Co-authored-by: OpenAI Codex <codex@openai.com>

copy-pr-bot · 2026-05-08T13:34:57Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

dhruv9vats · 2026-05-08T13:42:56Z

-class data_batch : public std::enable_shared_from_this<data_batch> {
- public:
-  // -- Construction --
-
-  /**
-   * @brief Construct a new data_batch.
-   *
-   * @param batch_id Unique identifier for this batch (immutable after construction).
-   * @param data     Owned data representation; must not be null.
-   */
-  data_batch(uint64_t batch_id, std::unique_ptr<idata_representation> data);


It is incorrect to

data_batch : public std::enable_shared_from_this<data_batch>

and have a public constructor. To be able to use shared_from_this(), the object must already be owned by a shared_ptr, otherwise it throws:
See https://en.cppreference.com/cpp/memory/enable_shared_from_this (specially example misuseGood()).

We rely heavily and primarily on shared_from_this() to extend lifetimes. Thus we must use a factory fn to create data batches and make the constructor private.

dhruv9vats · 2026-05-08T13:44:02Z


-  // -- Static transition methods (D-13/D-14/D-15/D-17) --
+ private:
+  data_batch_core(uint64_t batch_id, std::unique_ptr<idata_representation> data);


This is private so only data_batch can create this core, and thus access is only through the RAII handles.

dhruv9vats · 2026-05-08T13:56:28Z

+class data_batch : public std::enable_shared_from_this<data_batch> {
+  friend read_only_data_batch;
+  friend mutable_data_batch;
+


The primary motivations to separate data_batch and data_batch_core are:

Reduce code duplication, so no more:

Tier read_only_data_batch::get_current_tier() { return _batch->get_current_tier(); } Tier mutable_data_batch::get_current_tier() { return _batch->get_current_tier(); } memory_space* read_only_data_batch::get_memory_space() { return _batch->memory_space(); } memory_space* mutable_data_batch::get_memory_space() { return _batch->memory_space(); }

And the convert_to / clone_to template declarations.

The RAII handles instead simply act as "stencils", conditionally providing the access/"view" to data_batch_core APIs via operator overloading, instead of a middlemen role.

Allows separating concerns

This I don't have a strong opinion on. But this structure allows keeping the data_batch_core basically as a POD, and move the thread-safety concerns to the parent data_batch

This is the class where future data_batch_probe like constructs may be added.

dhruv9vats · 2026-05-08T13:57:40Z

+  // will allow only read access
+  const data_batch_core* operator->() const { return &(_owner->_batch); }
+  const data_batch_core& operator*() const { return _owner->_batch; }
+


Simply forwards to data_batch_core

dhruv9vats · 2026-05-08T13:59:14Z

+  read_only_data_batch clone_read_only_access() const
+  {
+    std::optional<read_only_data_batch> maybe_clone = _owner->try_get_read_only();


Explicitness here felt better instead of a copy constructor. So the caller knows exactly what is being copied. though no strong opinion here either.

i am not in disagreement, but that will add a whole bunch more changes in Sirius. Not sure if its worth it

dhruv9vats · 2026-05-08T14:15:21Z

-  /** @brief Get a raw pointer to the memory space. */
-  memory::memory_space* get_memory_space() const { return _batch->get_memory_space(); }
-
-  // -- Write methods (D-10/ACC-02) --


(D-10/ACC-02)

not sure what these are, indicators of some sort? LLM bread crumbs?

dhruv9vats · 2026-05-08T14:43:44Z

-   * For shared_ptr: The batch is copied to each repository.
-   * For unique_ptr: This method requires only one operator (single owner semantics).


Under the assumption that we want data_batch to be thread-safe, these I believe end up being conflicting requirements, or atleast can't be supported by the same data_batch.

I dont think we can have both

lifetime extending RAII handles that allow parallel reads and exclusive writes,

and have a data_batch under unique_ptr becauseshared_from_this then throws.

To support this we would at least need a wrapper over the data_batch, or another wrapper around data_batch_core.

dhruv9vats · 2026-05-08T14:49:41Z

/ok to test bffa10d

dhruv9vats · 2026-05-08T15:22:13Z

-   * @return A mutable_data_batch holding the exclusive lock.
+   * @throws std::runtime_error if subscriber count is already zero.
   */
-  [[nodiscard]] static mutable_data_batch readonly_to_mutable(read_only_data_batch&& accessor);


since this might block, just having a single path read_only -> idle -> get_mutable / try_get_mutable that retains the choice of whether to wait or just try might be simpler?

dhruv9vats · 2026-05-08T15:59:05Z

+   * @return Non-owning pointer to the data, or nullptr if empty.
   */
-  void unsubscribe();
+  idata_representation* get_data() const;


Maybe this and get_memory_space are made to return const idata_representation* ? if mutability of the underlying pointer content is not needed anywhere?

dhruv9vats · 2026-05-08T16:03:05Z

+    for (auto& op : ops) {
+      _repositories[{op.first, std::string(op.second)}]->add_data_batch(batch);


If the primary use case of the unique_ptr<data_batch> was to enforce the "can be added on only one repo", that might be instead handled at the repo/repo-manager level. If it was a consequence, we should think of use cases where we need to guarantee unique ownership, and the shared ownership model does not suffice. (Sirius current, AFAIK, does not make use of unique_(data_batch / repo / repo_manager))

The refactor that introduced read_only_data_batch and mutable_data_batch basically made it so that we cant have unique data_batch, and I think we decided that was ok. Any code or comments that are still here that indicate that we could have unique data_batch was a mistake

wmalpica

In general i like the data_batch_core concept, and how much comment and documentation cleanup this PR brings.
But requiring a make function, using get_read_only vs to_read_only and no copy constructors for read_only_data_batch, plus other changes means that this PR would require a very large number of changes in Sirius. I would recommend postponing that Sirius refactor until after some of the large initiatives currently underway are complete

wmalpica · 2026-05-08T17:10:27Z

-    |   mut.convert_to<TargetRepresentation>(registry, target_space, stream)
+    |   read_only_data_batch::to_idle(std::move(ro))      [release shared lock]
+    |   mutable_data_batch mut = batch->get_mutable()     [exclusive lock]
+    |   mut->convert_to<TargetRepresentation>(registry, target_space, stream)


Why do you have mut->convert_to instead of mut.convert_to? its not a pointer or an optional.

Similarly you have several places where you also have ro->

wmalpica · 2026-05-08T17:17:58Z

   */
-  memory::memory_space* get_memory_space() const;
+  template <typename TargetRepresentation>
+  std::shared_ptr<data_batch> clone_to(representation_converter_registry& registry,


the data_batch class should not be able to make clones. That can only be done by the mutable_data_batch and read_only_data_batch classes

wmalpica · 2026-05-08T17:19:32Z

+  read_only_data_batch clone_read_only_access() const
+  {
+    std::optional<read_only_data_batch> maybe_clone = _owner->try_get_read_only();


i am not in disagreement, but that will add a whole bunch more changes in Sirius. Not sure if its worth it

wmalpica · 2026-05-08T17:22:05Z

+    for (auto& op : ops) {
+      _repositories[{op.first, std::string(op.second)}]->add_data_batch(batch);


The refactor that introduced read_only_data_batch and mutable_data_batch basically made it so that we cant have unique data_batch, and I think we decided that was ok. Any code or comments that are still here that indicate that we could have unique data_batch was a mistake

dhruv9vats and others added 5 commits May 8, 2026 17:31

introduce synchronized_data_batch

eccef60

Add state access methods to synchronized_data_batch

0e9ed73

rename data_batch to data_batch_core and synchronized_data_batch to d…

5eadd0c

…ata_batch

modify add_data_batch_impl to not have to handle unique_ptr data_batch

3524f5d

Migrate data batch tests to synchronized API

bffa10d

Remove unique repository coverage and update docs for the data_batch accessor model. Co-authored-by: OpenAI Codex <codex@openai.com>

dhruv9vats added breaking improvement labels May 8, 2026

dhruv9vats commented May 8, 2026

View reviewed changes

wmalpica reviewed May 8, 2026

View reviewed changes

		* For shared_ptr: The batch is copied to each repository.
		* For unique_ptr: This method requires only one operator (single owner semantics).

		for (auto& op : ops) {
		_repositories[{op.first, std::string(op.second)}]->add_data_batch(batch);

Conversation

dhruv9vats commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruv9vats May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruv9vats May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruv9vats May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruv9vats commented May 8, 2026

Uh oh!

dhruv9vats May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruv9vats May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmalpica left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhruv9vats commented May 8, 2026 •

edited

Loading

dhruv9vats May 8, 2026 •

edited

Loading

dhruv9vats May 8, 2026 •

edited

Loading

dhruv9vats May 8, 2026 •

edited

Loading

dhruv9vats May 8, 2026 •

edited

Loading

dhruv9vats May 8, 2026 •

edited

Loading