Skip to content

Tighten data_batch#120

Draft
dhruv9vats wants to merge 5 commits intoNVIDIA:mainfrom
dhruv9vats:synchronized-data-batch
Draft

Tighten data_batch#120
dhruv9vats wants to merge 5 commits intoNVIDIA:mainfrom
dhruv9vats:synchronized-data-batch

Conversation

@dhruv9vats
Copy link
Copy Markdown
Member

@dhruv9vats dhruv9vats commented May 8, 2026

Proposes tightening the API on and around the data_batch.

These proposals are motivated by efforts to:

  • Ensure correctness
  • Highlight conflicting requirements
  • Reduce duplication of code
  • Separate concerns where possible
  • Reduce escape hatches

listed approximately in the order of importance given while designing this.

This is an explicit "show me the code" proposal implemented to gather concrete criticism to allow settling on a quality API for the central data_batch construct. There are comments for the reviewer that details why a specific change was made.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Comment on lines -68 to -78
class data_batch : public std::enable_shared_from_this<data_batch> {
public:
// -- Construction --

/**
* @brief Construct a new data_batch.
*
* @param batch_id Unique identifier for this batch (immutable after construction).
* @param data Owned data representation; must not be null.
*/
data_batch(uint64_t batch_id, std::unique_ptr<idata_representation> data);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is incorrect to

data_batch : public std::enable_shared_from_this<data_batch>

and have a public constructor. To be able to use shared_from_this(), the object must already be owned by a shared_ptr, otherwise it throws:
See https://en.cppreference.com/cpp/memory/enable_shared_from_this (specially example misuseGood()).

We rely heavily and primarily on shared_from_this() to extend lifetimes. Thus we must use a factory fn to create data batches and make the constructor private.


// -- Static transition methods (D-13/D-14/D-15/D-17) --
private:
data_batch_core(uint64_t batch_id, std::unique_ptr<idata_representation> data);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is private so only data_batch can create this core, and thus access is only through the RAII handles.

Comment on lines +162 to +165
class data_batch : public std::enable_shared_from_this<data_batch> {
friend read_only_data_batch;
friend mutable_data_batch;

Copy link
Copy Markdown
Member Author

@dhruv9vats dhruv9vats May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary motivations to separate data_batch and data_batch_core are:

  1. Reduce code duplication, so no more:
Tier read_only_data_batch::get_current_tier() { return _batch->get_current_tier(); }
Tier  mutable_data_batch::get_current_tier() { return _batch->get_current_tier(); }

memory_space* read_only_data_batch::get_memory_space() { return _batch->memory_space(); }
memory_space*  mutable_data_batch::get_memory_space() { return _batch->memory_space(); }

And the convert_to / clone_to template declarations.

The RAII handles instead simply act as "stencils", conditionally providing the access/"view" to data_batch_core APIs via operator overloading, instead of a middlemen role.

  1. Allows separating concerns
  • This I don't have a strong opinion on. But this structure allows keeping the data_batch_core basically as a POD, and move the thread-safety concerns to the parent data_batch
  1. This is the class where future data_batch_probe like constructs may be added.

Comment on lines +337 to +340
// will allow only read access
const data_batch_core* operator->() const { return &(_owner->_batch); }
const data_batch_core& operator*() const { return _owner->_batch; }

Copy link
Copy Markdown
Member Author

@dhruv9vats dhruv9vats May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply forwards to data_batch_core

Comment on lines +350 to +352
read_only_data_batch clone_read_only_access() const
{
std::optional<read_only_data_batch> maybe_clone = _owner->try_get_read_only();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitness here felt better instead of a copy constructor. So the caller knows exactly what is being copied. though no strong opinion here either.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not in disagreement, but that will add a whole bunch more changes in Sirius. Not sure if its worth it

/** @brief Get a raw pointer to the memory space. */
memory::memory_space* get_memory_space() const { return _batch->get_memory_space(); }

// -- Write methods (D-10/ACC-02) --
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(D-10/ACC-02)

not sure what these are, indicators of some sort? LLM bread crumbs?

Comment on lines -130 to -131
* For shared_ptr: The batch is copied to each repository.
* For unique_ptr: This method requires only one operator (single owner semantics).
Copy link
Copy Markdown
Member Author

@dhruv9vats dhruv9vats May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the assumption that we want data_batch to be thread-safe, these I believe end up being conflicting requirements, or atleast can't be supported by the same data_batch.

I dont think we can have both

  • lifetime extending RAII handles that allow parallel reads and exclusive writes,
  • and have a data_batch under unique_ptr becauseshared_from_this then throws.

To support this we would at least need a wrapper over the data_batch, or another wrapper around data_batch_core.

@dhruv9vats
Copy link
Copy Markdown
Member Author

/ok to test bffa10d

* @return A mutable_data_batch holding the exclusive lock.
* @throws std::runtime_error if subscriber count is already zero.
*/
[[nodiscard]] static mutable_data_batch readonly_to_mutable(read_only_data_batch&& accessor);
Copy link
Copy Markdown
Member Author

@dhruv9vats dhruv9vats May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this might block, just having a single path read_only -> idle -> get_mutable / try_get_mutable that retains the choice of whether to wait or just try might be simpler?

* @return Non-owning pointer to the data, or nullptr if empty.
*/
void unsubscribe();
idata_representation* get_data() const;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this and get_memory_space are made to return const idata_representation* ? if mutability of the underlying pointer content is not needed anywhere?

Comment on lines +248 to +249
for (auto& op : ops) {
_repositories[{op.first, std::string(op.second)}]->add_data_batch(batch);
Copy link
Copy Markdown
Member Author

@dhruv9vats dhruv9vats May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the primary use case of the unique_ptr<data_batch> was to enforce the "can be added on only one repo", that might be instead handled at the repo/repo-manager level. If it was a consequence, we should think of use cases where we need to guarantee unique ownership, and the shared ownership model does not suffice. (Sirius current, AFAIK, does not make use of unique_(data_batch / repo / repo_manager))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor that introduced read_only_data_batch and mutable_data_batch basically made it so that we cant have unique data_batch, and I think we decided that was ok. Any code or comments that are still here that indicate that we could have unique data_batch was a mistake

Copy link
Copy Markdown
Contributor

@wmalpica wmalpica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general i like the data_batch_core concept, and how much comment and documentation cleanup this PR brings.
But requiring a make function, using get_read_only vs to_read_only and no copy constructors for read_only_data_batch, plus other changes means that this PR would require a very large number of changes in Sirius. I would recommend postponing that Sirius refactor until after some of the large initiatives currently underway are complete

Comment thread docs/data-management.md
| mut.convert_to<TargetRepresentation>(registry, target_space, stream)
| read_only_data_batch::to_idle(std::move(ro)) [release shared lock]
| mutable_data_batch mut = batch->get_mutable() [exclusive lock]
| mut->convert_to<TargetRepresentation>(registry, target_space, stream)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have mut->convert_to instead of mut.convert_to? its not a pointer or an optional.

Similarly you have several places where you also have ro->

*/
memory::memory_space* get_memory_space() const;
template <typename TargetRepresentation>
std::shared_ptr<data_batch> clone_to(representation_converter_registry& registry,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data_batch class should not be able to make clones. That can only be done by the mutable_data_batch and read_only_data_batch classes

Comment on lines +350 to +352
read_only_data_batch clone_read_only_access() const
{
std::optional<read_only_data_batch> maybe_clone = _owner->try_get_read_only();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not in disagreement, but that will add a whole bunch more changes in Sirius. Not sure if its worth it

Comment on lines +248 to +249
for (auto& op : ops) {
_repositories[{op.first, std::string(op.second)}]->add_data_batch(batch);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor that introduced read_only_data_batch and mutable_data_batch basically made it so that we cant have unique data_batch, and I think we decided that was ok. Any code or comments that are still here that indicate that we could have unique data_batch was a mistake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants