Tighten data_batch#120
Conversation
Remove unique repository coverage and update docs for the data_batch accessor model. Co-authored-by: OpenAI Codex <codex@openai.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| class data_batch : public std::enable_shared_from_this<data_batch> { | ||
| public: | ||
| // -- Construction -- | ||
|
|
||
| /** | ||
| * @brief Construct a new data_batch. | ||
| * | ||
| * @param batch_id Unique identifier for this batch (immutable after construction). | ||
| * @param data Owned data representation; must not be null. | ||
| */ | ||
| data_batch(uint64_t batch_id, std::unique_ptr<idata_representation> data); |
There was a problem hiding this comment.
It is incorrect to
data_batch : public std::enable_shared_from_this<data_batch>
and have a public constructor. To be able to use shared_from_this(), the object must already be owned by a shared_ptr, otherwise it throws:
See https://en.cppreference.com/cpp/memory/enable_shared_from_this (specially example misuseGood()).
We rely heavily and primarily on shared_from_this() to extend lifetimes. Thus we must use a factory fn to create data batches and make the constructor private.
|
|
||
| // -- Static transition methods (D-13/D-14/D-15/D-17) -- | ||
| private: | ||
| data_batch_core(uint64_t batch_id, std::unique_ptr<idata_representation> data); |
There was a problem hiding this comment.
This is private so only data_batch can create this core, and thus access is only through the RAII handles.
| class data_batch : public std::enable_shared_from_this<data_batch> { | ||
| friend read_only_data_batch; | ||
| friend mutable_data_batch; | ||
|
|
There was a problem hiding this comment.
The primary motivations to separate data_batch and data_batch_core are:
- Reduce code duplication, so no more:
Tier read_only_data_batch::get_current_tier() { return _batch->get_current_tier(); }
Tier mutable_data_batch::get_current_tier() { return _batch->get_current_tier(); }
memory_space* read_only_data_batch::get_memory_space() { return _batch->memory_space(); }
memory_space* mutable_data_batch::get_memory_space() { return _batch->memory_space(); }And the convert_to / clone_to template declarations.
The RAII handles instead simply act as "stencils", conditionally providing the access/"view" to data_batch_core APIs via operator overloading, instead of a middlemen role.
- Allows separating concerns
- This I don't have a strong opinion on. But this structure allows keeping the
data_batch_corebasically as a POD, and move the thread-safety concerns to the parentdata_batch
- This is the class where future
data_batch_probelike constructs may be added.
| // will allow only read access | ||
| const data_batch_core* operator->() const { return &(_owner->_batch); } | ||
| const data_batch_core& operator*() const { return _owner->_batch; } | ||
|
|
There was a problem hiding this comment.
Simply forwards to data_batch_core
| read_only_data_batch clone_read_only_access() const | ||
| { | ||
| std::optional<read_only_data_batch> maybe_clone = _owner->try_get_read_only(); |
There was a problem hiding this comment.
Explicitness here felt better instead of a copy constructor. So the caller knows exactly what is being copied. though no strong opinion here either.
There was a problem hiding this comment.
i am not in disagreement, but that will add a whole bunch more changes in Sirius. Not sure if its worth it
| /** @brief Get a raw pointer to the memory space. */ | ||
| memory::memory_space* get_memory_space() const { return _batch->get_memory_space(); } | ||
|
|
||
| // -- Write methods (D-10/ACC-02) -- |
There was a problem hiding this comment.
(D-10/ACC-02)
not sure what these are, indicators of some sort? LLM bread crumbs?
| * For shared_ptr: The batch is copied to each repository. | ||
| * For unique_ptr: This method requires only one operator (single owner semantics). |
There was a problem hiding this comment.
Under the assumption that we want data_batch to be thread-safe, these I believe end up being conflicting requirements, or atleast can't be supported by the same data_batch.
I dont think we can have both
- lifetime extending RAII handles that allow parallel reads and exclusive writes,
- and have a
data_batchunderunique_ptrbecauseshared_from_thisthen throws.
To support this we would at least need a wrapper over the data_batch, or another wrapper around data_batch_core.
|
/ok to test bffa10d |
| * @return A mutable_data_batch holding the exclusive lock. | ||
| * @throws std::runtime_error if subscriber count is already zero. | ||
| */ | ||
| [[nodiscard]] static mutable_data_batch readonly_to_mutable(read_only_data_batch&& accessor); |
There was a problem hiding this comment.
since this might block, just having a single path read_only -> idle -> get_mutable / try_get_mutable that retains the choice of whether to wait or just try might be simpler?
| * @return Non-owning pointer to the data, or nullptr if empty. | ||
| */ | ||
| void unsubscribe(); | ||
| idata_representation* get_data() const; |
There was a problem hiding this comment.
Maybe this and get_memory_space are made to return const idata_representation* ? if mutability of the underlying pointer content is not needed anywhere?
| for (auto& op : ops) { | ||
| _repositories[{op.first, std::string(op.second)}]->add_data_batch(batch); |
There was a problem hiding this comment.
If the primary use case of the unique_ptr<data_batch> was to enforce the "can be added on only one repo", that might be instead handled at the repo/repo-manager level. If it was a consequence, we should think of use cases where we need to guarantee unique ownership, and the shared ownership model does not suffice. (Sirius current, AFAIK, does not make use of unique_(data_batch / repo / repo_manager))
There was a problem hiding this comment.
The refactor that introduced read_only_data_batch and mutable_data_batch basically made it so that we cant have unique data_batch, and I think we decided that was ok. Any code or comments that are still here that indicate that we could have unique data_batch was a mistake
wmalpica
left a comment
There was a problem hiding this comment.
In general i like the data_batch_core concept, and how much comment and documentation cleanup this PR brings.
But requiring a make function, using get_read_only vs to_read_only and no copy constructors for read_only_data_batch, plus other changes means that this PR would require a very large number of changes in Sirius. I would recommend postponing that Sirius refactor until after some of the large initiatives currently underway are complete
| | mut.convert_to<TargetRepresentation>(registry, target_space, stream) | ||
| | read_only_data_batch::to_idle(std::move(ro)) [release shared lock] | ||
| | mutable_data_batch mut = batch->get_mutable() [exclusive lock] | ||
| | mut->convert_to<TargetRepresentation>(registry, target_space, stream) |
There was a problem hiding this comment.
Why do you have mut->convert_to instead of mut.convert_to? its not a pointer or an optional.
Similarly you have several places where you also have ro->
| */ | ||
| memory::memory_space* get_memory_space() const; | ||
| template <typename TargetRepresentation> | ||
| std::shared_ptr<data_batch> clone_to(representation_converter_registry& registry, |
There was a problem hiding this comment.
the data_batch class should not be able to make clones. That can only be done by the mutable_data_batch and read_only_data_batch classes
| read_only_data_batch clone_read_only_access() const | ||
| { | ||
| std::optional<read_only_data_batch> maybe_clone = _owner->try_get_read_only(); |
There was a problem hiding this comment.
i am not in disagreement, but that will add a whole bunch more changes in Sirius. Not sure if its worth it
| for (auto& op : ops) { | ||
| _repositories[{op.first, std::string(op.second)}]->add_data_batch(batch); |
There was a problem hiding this comment.
The refactor that introduced read_only_data_batch and mutable_data_batch basically made it so that we cant have unique data_batch, and I think we decided that was ok. Any code or comments that are still here that indicate that we could have unique data_batch was a mistake
Proposes tightening the API on and around the
data_batch.These proposals are motivated by efforts to:
listed approximately in the order of importance given while designing this.
This is an explicit "show me the code" proposal implemented to gather concrete criticism to allow settling on a quality API for the central
data_batchconstruct. There are comments for the reviewer that details why a specific change was made.