(improvement) (Cython only) row_parser: cache ParseDesc for prepared statements by mykaul · Pull Request #742 · scylladb/python-driver

mykaul · 2026-03-13T10:51:01Z

Cache the ParseDesc object constructed in recv_results_rows() so that repeated executions of the same prepared statement skip the list comprehensions, ColDesc construction, and make_deserializers() call.

The cache is keyed by id(column_metadata). For prepared statements the result_metadata list is stored on PreparedStatement and reused, so id() is stable. On cache hit we verify object identity (cached_ref is column_metadata) and that session-level settings (column_encryption_policy, protocol_version) still match.

A clear_parse_desc_cache() function is exposed for testing.

Benchmark results (median, pytest-benchmark)

ParseDesc construction only

Columns	Before (original)	After (with cache)
5 cols	3,966 ns	191 ns
10 cols	5,730 ns	175 ns
20 cols	9,266 ns	166 ns
50 cols	19,388 ns	193 ns

Full pipeline (ParseDesc + row parsing)

Scenario	Before (original)	After (with cache)
1 row × 10 col	6,674 ns	1,418 ns
100 rows × 5 col	48,814 ns	43,825 ns
1000 rows × 5 col	449,386 ns	430,058 ns

For small result sets (single-row lookups common with prepared statements), ParseDesc construction is a large fraction of the total response-path cost. Caching eliminates it entirely after the first execution.

All 116 unit tests pass (1 skipped — pre-existing test_datetype issue).

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

Copilot

Pull request overview

This PR optimizes the Cython row parsing fast-path for prepared statements by caching the ParseDesc built in recv_results_rows(), reducing repeated per-response construction overhead.

Changes:

Add a module-level cache in cassandra/row_parser.pyx keyed by id(column_metadata) to reuse ParseDesc, column_names, and column_types for prepared statement executions.
Expose clear_parse_desc_cache() and call it from Cluster.shutdown().
Add a benchmark/correctness benchmark module to measure and validate the caching behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
cassandra/row_parser.pyx	Introduces ParseDesc caching and adds a public cache-clear helper.
cassandra/cluster.py	Clears the module-level ParseDesc cache during `Cluster.shutdown()`.
benchmarks/test_parse_desc_cache_benchmark.py	Adds benchmarks and cache-behavior checks for the new caching approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cassandra/cluster.py

benchmarks/test_parse_desc_cache_benchmark.py

cassandra/row_parser.pyx

Copilot

Pull request overview

This PR improves the Cython result-row parsing hot path by caching the ParseDesc constructed in recv_results_rows() for repeated executions of the same prepared statement (stable result_metadata object), reducing repeated metadata processing and deserializer construction.

Changes:

Add a bounded module-level cache in cassandra/row_parser.pyx keyed by id(column_metadata) with identity + session-settings validation.
Add cache management helpers (clear_parse_desc_cache(), get_parse_desc_cache_size()) intended for test/benchmark visibility.
Add a new benchmark module to measure ParseDesc construction and end-to-end parsing impact.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
cassandra/row_parser.pyx	Introduces ParseDesc caching for prepared-statement row decoding and adds cache clear/size helpers.
benchmarks/test_parse_desc_cache_benchmark.py	Adds benchmarks (and some correctness assertions) for cached vs uncached ParseDesc construction/parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

benchmarks/test_parse_desc_cache_benchmark.py

+@pytest.mark.parametrize("nrows,ncols", [(1, 10), (100, 5), (1000, 5)])
+def test_integration_cython_cached(benchmark, nrows, ncols):
+    """Integration: Cython recv_results_rows with ParseDesc cache hit (prepared stmt)."""
+    col_meta = _build_column_metadata(ncols)


cassandra/row_parser.pyx

+        # Only use the cache for prepared statements (self.column_metadata is
+        # None, so column_metadata comes from result_metadata which is a
+        # stable list stored on PreparedStatement).  Inline metadata from
+        # non-prepared queries creates a fresh list every time and would
+        # cause unbounded cache growth.
+        if self.column_metadata is None and result_metadata is not None:
+            desc, self.column_names, self.column_types = _get_or_build_parse_desc(
+                column_metadata, column_encryption_policy, protocol_version)
+        else:
+            desc, self.column_names, self.column_types = _build_parse_desc(
+                column_metadata, column_encryption_policy, protocol_version)


benchmarks/test_parse_desc_cache_benchmark.py

@@ -0,0 +1,542 @@
+# Copyright DataStax, Inc.


Cache the ParseDesc object constructed in recv_results_rows() so that repeated executions of the same prepared statement skip the list comprehensions, ColDesc construction, and make_deserializers() call. The cache is keyed by id(column_metadata). For prepared statements the result_metadata list is stored on PreparedStatement and reused, so id() is stable. On cache hit we verify object identity (cached_ref is column_metadata) and that session-level settings (column_encryption_policy, protocol_version) still match. Implementation details: - _get_or_build_parse_desc: cached path, used only when column_metadata comes from result_metadata (prepared statements with stable id()). - _build_parse_desc: uncached path, used for inline metadata from non-prepared queries that creates a fresh list every execution. - Cache is bounded to 256 entries; cleared entirely when full. - Returns only (desc, column_names, column_types) to the caller, avoiding exposure of internal cache fields. - Thread-safety: dict get/set are atomic in CPython (GIL), but concurrent cache misses may cause redundant construction (benign). A clear_parse_desc_cache() and get_parse_desc_cache_size() function are exposed for testing. ## Benchmark results (median, pytest-benchmark) ### ParseDesc construction only (reference benchmarks) | Columns | **Before** (original) | **After** (with cache) | |---------|-----------------------|------------------------| | 5 cols | 3,966 ns | 191 ns | | 10 cols | 5,730 ns | 175 ns | | 20 cols | 9,266 ns | 166 ns | | 50 cols | 19,388 ns | 193 ns | ### Full pipeline integration (recv_results_rows through Cython) | Scenario | **Before** (original) | **After** (with cache) | |-------------------|-----------------------|------------------------| | 1 row x 10 col | 40,867 ns | 2,977 ns | | 100 rows x 5 col | 145,584 ns | 73,206 ns | | 1000 rows x 5 col | 1,099,825 ns | 999,517 ns | For small result sets (single-row lookups common with prepared statements), ParseDesc construction is a large fraction of the total response-path cost. Caching eliminates it entirely after the first execution. All 623 unit tests pass (16 skipped - pre-existing).

- Fix copyright header in benchmark file (DataStax -> ScyllaDB) - Add pytest.importorskip guard for pytest-benchmark in benchmark file - Add unit tests for ParseDesc cache under tests/unit/cython: cache hit, miss, protocol version invalidation, clear, bounded eviction, correctness

mykaul marked this pull request as draft March 13, 2026 10:51

mykaul changed the title ~~(improvement) row_parser: cache ParseDesc for prepared statements~~ (improvement) (Cython) row_parser: cache ParseDesc for prepared statements Mar 13, 2026

mykaul changed the title ~~(improvement) (Cython) row_parser: cache ParseDesc for prepared statements~~ (improvement) (Cython only) row_parser: cache ParseDesc for prepared statements Mar 13, 2026

mykaul force-pushed the perf/cache-parse-desc branch from e355833 to eb3ae98 Compare March 13, 2026 12:55

mykaul requested a review from Copilot March 14, 2026 09:44

Copilot started reviewing on behalf of mykaul March 14, 2026 09:44 View session

mykaul mentioned this pull request Mar 14, 2026

Tracking: Vector search (VectorType) performance improvement PRs #746

Open

Copilot AI reviewed Mar 14, 2026

View reviewed changes

cassandra/cluster.py Outdated Show resolved Hide resolved

benchmarks/test_parse_desc_cache_benchmark.py Show resolved Hide resolved

cassandra/row_parser.pyx Outdated Show resolved Hide resolved

mykaul mentioned this pull request Mar 14, 2026

Tracking: General (non-vector) performance improvement PRs #747

Open

mykaul force-pushed the perf/cache-parse-desc branch from ee3c279 to 9f2f3ff Compare March 15, 2026 16:45

mykaul mentioned this pull request Mar 15, 2026

(improvement) LWT prepared statement performance: analysis and improvement plan #751

Open

mykaul requested a review from Copilot March 16, 2026 18:13

Copilot started reviewing on behalf of mykaul March 16, 2026 18:14 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

mykaul added 2 commits March 20, 2026 19:08

mykaul force-pushed the perf/cache-parse-desc branch from 9f2f3ff to 203080e Compare March 20, 2026 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(improvement) (Cython only) row_parser: cache ParseDesc for prepared statements#742

(improvement) (Cython only) row_parser: cache ParseDesc for prepared statements#742
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/cache-parse-desc

mykaul commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mykaul commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results (median, pytest-benchmark)

ParseDesc construction only

Full pipeline (ParseDesc + row parsing)

Pre-review checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mykaul commented Mar 13, 2026 •

edited

Loading