ENH: Add bulk getValues/setValues API to AbstractDataStore#1564
ENH: Add bulk getValues/setValues API to AbstractDataStore#1564joeykleingers wants to merge 2 commits intoBlueQuartzSoftware:developfrom
Conversation
99c34b8 to
af23261
Compare
Add virtual getValues() and setValues() methods to AbstractDataStore for reading/writing contiguous ranges of elements in a single call. DataStore overrides use std::memcpy for maximum throughput. ZarrStore overrides (in FileStore plugin) use chunk-aware bulk access with three optimizations: per-row memcpy, chunk-sticky shared_ptr reuse, and multi-row extension when chunks cover full fast dimensions. Existing methods fill(), copy(), copyFrom(), setTuple(), and fillTuple() are rewritten to use the bulk API internally, so all callers benefit automatically with no code changes. Uses std::make_unique<T[]> for intermediate buffers to avoid the std::vector<bool> bit-packing issue.
af23261 to
d9e240c
Compare
- Make getValues/setValues pure virtual on AbstractDataStore - Revert fill/copy defaults to simple per-element loops - Add NVI copyFromImpl virtual for optimized data transfer - DataStore: override fill, copy, copyFromImpl with direct access - DataStore: use std::copy instead of std::memcpy - EmptyDataStore: add getValues/setValues stubs (throw) - Revert fillTuple to simple setValue loop (small component count)
JDuffeyBQ
left a comment
There was a problem hiding this comment.
Changes look good. One thing is that it looks a bit odd at first to see a getValues() call with no return value. Maybe we could change the function names to make that a bit more obvious. Something like getValuesIntoBuffer(). Not a major issue but I think most of our other functions named with "get" return something so it would help differentiate them.
@joeykleingers @JDuffeyBQ |
|
copyIntoBuffer |
Summary
getValues(startIndex, span)andsetValues(startIndex, span)virtual methods toAbstractDataStore<T>for bulk read/write of contiguous element rangesDataStore<T>overrides usestd::memcpyfor maximum in-memory throughputfill(),copy(),copyFrom(),setTuple(), andfillTuple()to use the bulk API internally so all existing callers benefit automaticallyCompanion PR: https://github.com/BlueQuartzSoftware/FileStore/pull/4
Motivation
Per-element OOC access through
ZarrStore::operator[]costs ~50-100ns each even when the chunk is cached (virtual dispatch -> mutex -> N-D position calc -> 6-slot FIFO cache search -> element access -> mutex unlock). Filter-level optimizations have minimized the number of accesses, but the per-access cost is fixed by the storage layer.The bulk API allows
ZarrStore(in the FileStore plugin) to override with chunk-aware implementations that usememcpyper chunk-row instead of per element, with three optimization levels:shared_ptracross rows, avoids redundant cache lookups (~2500x fewer lookups per chunk)Implementation Details
getValues/setValuesonAbstractDataStoreloop overgetValue/setValuefor backward compatibility with any subclass that doesn't overridestd::make_unique<T[]>for intermediate buffers to avoid thestd::vector<bool>bit-packing issueTest Plan
DataStore Bulk getValues/setValues— full array, partial range, boundary, roundtrip, empty span, out-of-range, single element (7 sections)DataStore Bulk fill/copy/copyFrom/setTuple/fillTupleDataStore Cross-API Roundtrip— setValue->getValues, setValues->getValue, partial writesDataStore Bulk getValues with multi-component— multi-dim tuples + float32