Skip to content

(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid …#740

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/cache-named-tuple-factory
Draft

(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid …#740
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/cache-named-tuple-factory

Conversation

@mykaul
Copy link

@mykaul mykaul commented Mar 13, 2026

…repeated exec() calls

Cache the Row namedtuple class keyed on tuple(colnames) so Python's namedtuple() (which internally calls exec()) is only invoked once per unique column schema. For prepared statements the column names never change, eliminating redundant class creation on every result set.

Motivation

named_tuple_factory is the default row_factory in the driver. Every call to namedtuple('Row', columns) internally calls exec() to generate a new class -- this is surprisingly expensive. For prepared statements executing the same query repeatedly, the column names never change, yet we pay the namedtuple() + exec() cost on every result set.

Benchmark results

Benchmarks compare the original code (Before) against the new cached implementation (After).

10 columns, 1 row (isolates class creation overhead):

Variant Min Mean Median Ops/sec
Before (original) 43,490 ns 59,976 ns 47,653 ns 16.7 Kops/s
After (with cache) 235 ns 452 ns 353 ns 2,210 Kops/s

5 columns, 100 rows:

Variant Min Mean Median Ops/sec
Before (original) 57.4 us 91.2 us 65.8 us 10,969/s
After (with cache) 19.3 us 25.3 us 24.0 us 39,594/s

10 columns, 100 rows:

Variant Min Mean Median Ops/sec
Before (original) 56.7 us 101.9 us 75.6 us 9,813/s
After (with cache) 18.1 us 21.4 us 20.4 us 46,825/s

Design notes

  • Cache is a plain dict keyed on tuple(colnames) (raw column names before cleaning)
  • Error handling paths (SyntaxError, Exception) preserved unchanged
  • Cache is naturally bounded by the number of distinct queries

Tests

All existing unit tests pass (46 passed).

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

…repeated exec() calls

Cache the Row namedtuple class keyed on tuple(colnames) so Python's
namedtuple() (which internally calls exec()) is only invoked once per
unique column schema. For prepared statements the column names never
change, eliminating redundant class creation on every result set.

## Motivation

named_tuple_factory is the default row_factory in the driver. Every call
to namedtuple('Row', columns) internally calls exec() to generate a new
class -- this is surprisingly expensive. For prepared statements executing
the same query repeatedly, the column names never change, yet we pay the
namedtuple() + exec() cost on every result set.

## Benchmark results

Benchmarks compare the original code (Before) against the new cached
implementation (After).

10 columns, 1 row (isolates class creation overhead):
| Variant | Min | Mean | Median | Ops/sec |
|---|---|---|---|---|
| Before (original) | 43,490 ns | 59,976 ns | 47,653 ns | 16.7 Kops/s |
| After (with cache) | 235 ns | 452 ns | 353 ns | 2,210 Kops/s |

5 columns, 100 rows:
| Variant | Min | Mean | Median | Ops/sec |
|---|---|---|---|---|
| Before (original) | 57.4 us | 91.2 us | 65.8 us | 10,969/s |
| After (with cache) | 19.3 us | 25.3 us | 24.0 us | 39,594/s |

10 columns, 100 rows:
| Variant | Min | Mean | Median | Ops/sec |
|---|---|---|---|---|
| Before (original) | 56.7 us | 101.9 us | 75.6 us | 9,813/s |
| After (with cache) | 18.1 us | 21.4 us | 20.4 us | 46,825/s |

## Design notes

- Cache is a plain dict keyed on tuple(colnames) (raw column names before
  cleaning)
- Error handling paths (SyntaxError, Exception) preserved unchanged
- Cache is naturally bounded by the number of distinct queries

## Tests

All existing unit tests pass (46 passed).
@mykaul mykaul changed the title (improvement) cache namedtuple class in named_tuple_factory to avoid … (improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid … Mar 13, 2026
@mykaul mykaul marked this pull request as draft March 13, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant