(improvement) perf: remove copies on the read path#734
Draft
mykaul wants to merge 5 commits intoscylladb:masterfrom
Draft
(improvement) perf: remove copies on the read path#734mykaul wants to merge 5 commits intoscylladb:masterfrom
mykaul wants to merge 5 commits intoscylladb:masterfrom
Conversation
Replace getvalue() with getbuffer() memoryview in _read_frame_header and frame body extraction to avoid full-buffer copies. Add _reset_buffer() helper using getbuffer()[pos:] instead of read() to reduce allocations. Wrap memoryview usage in try/finally to ensure release before mutation. Increase in_buffer_size from 4096 to 65536 to reduce recv() call overhead.
Introduce a lightweight BytesReader class that operates directly on bytes data without BytesIO overhead. Materializes memoryview to bytes once in __init__ instead of checking on every read(). Includes remaining_buffer() method for zero-copy handoff to Cython parsers.
Add offset parameter to BytesIOReader so it can start reading from the middle of an existing buffer, avoiding the full-body copy at the Python-to-Cython boundary. Update row_parser.pyx to use f.remaining_buffer() for zero-copy handoff with hasattr fallback. Track _initial_offset for error recovery.
13 BytesReader tests covering read operations, remaining_buffer(), memoryview materialization, empty data, and EOFError handling. 9 BytesIOReader tests covering offset initialization, boundary conditions, read behavior with offset, and error cases.
…pact Standalone benchmark (no cluster required) that constructs synthetic RESULT/ROWS wire-format bodies and measures ProtocolHandler.decode_message() throughput across 8 scenarios (small to 16MB, narrow to 20-column wide). Pins to a single CPU core via sched_setaffinity for consistent results. Supports both Cython and pure-Python paths, with --cprofile option.
Author
|
#630 would have helped with wide_5k_doubles of course. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch set aims to reduce the memory copies we perform in the read path, to improve overall performance - mainly reduce latency on the processing side of the driver receiving the payload.
This is more effective on larger payloads of course.
Comparison to master:
(note - we can see that wide_5k_doubles is bottlenecked on something else - CPU processing of this payload - unpacking it. This may be optimized in a different PR)
./docs/source/.Fixes:annotations to PR description.