(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid … by mykaul · Pull Request #740 · scylladb/python-driver

mykaul · 2026-03-13T10:08:20Z

…repeated exec() calls

Cache the Row namedtuple class keyed on tuple(colnames) so Python's namedtuple() (which internally calls exec()) is only invoked once per unique column schema. For prepared statements the column names never change, eliminating redundant class creation on every result set.

Motivation

named_tuple_factory is the default row_factory in the driver. Every call to namedtuple('Row', columns) internally calls exec() to generate a new class -- this is surprisingly expensive. For prepared statements executing the same query repeatedly, the column names never change, yet we pay the namedtuple() + exec() cost on every result set.

Benchmark results

Benchmarks compare the original code (Before) against the new cached implementation (After).

10 columns, 1 row (isolates class creation overhead):

Variant	Min	Mean	Median	Ops/sec
Before (original)	43,490 ns	59,976 ns	47,653 ns	16.7 Kops/s
After (with cache)	235 ns	452 ns	353 ns	2,210 Kops/s

5 columns, 100 rows:

Variant	Min	Mean	Median	Ops/sec
Before (original)	57.4 us	91.2 us	65.8 us	10,969/s
After (with cache)	19.3 us	25.3 us	24.0 us	39,594/s

10 columns, 100 rows:

Variant	Min	Mean	Median	Ops/sec
Before (original)	56.7 us	101.9 us	75.6 us	9,813/s
After (with cache)	18.1 us	21.4 us	20.4 us	46,825/s

Design notes

Cache is a plain dict keyed on tuple(colnames) (raw column names before cleaning)
Error handling paths (SyntaxError, Exception) preserved unchanged
Cache is naturally bounded by the number of distinct queries

Tests

All existing unit tests pass (46 passed).

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

…repeated exec() calls Cache the Row namedtuple class keyed on tuple(colnames) so Python's namedtuple() (which internally calls exec()) is only invoked once per unique column schema. For prepared statements the column names never change, eliminating redundant class creation on every result set. ## Motivation named_tuple_factory is the default row_factory in the driver. Every call to namedtuple('Row', columns) internally calls exec() to generate a new class -- this is surprisingly expensive. For prepared statements executing the same query repeatedly, the column names never change, yet we pay the namedtuple() + exec() cost on every result set. ## Benchmark results Benchmarks compare the original code (Before) against the new cached implementation (After). 10 columns, 1 row (isolates class creation overhead): | Variant | Min | Mean | Median | Ops/sec | |---|---|---|---|---| | Before (original) | 43,490 ns | 59,976 ns | 47,653 ns | 16.7 Kops/s | | After (with cache) | 235 ns | 452 ns | 353 ns | 2,210 Kops/s | 5 columns, 100 rows: | Variant | Min | Mean | Median | Ops/sec | |---|---|---|---|---| | Before (original) | 57.4 us | 91.2 us | 65.8 us | 10,969/s | | After (with cache) | 19.3 us | 25.3 us | 24.0 us | 39,594/s | 10 columns, 100 rows: | Variant | Min | Mean | Median | Ops/sec | |---|---|---|---|---| | Before (original) | 56.7 us | 101.9 us | 75.6 us | 9,813/s | | After (with cache) | 18.1 us | 21.4 us | 20.4 us | 46,825/s | ## Design notes - Cache is a plain dict keyed on tuple(colnames) (raw column names before cleaning) - Error handling paths (SyntaxError, Exception) preserved unchanged - Cache is naturally bounded by the number of distinct queries ## Tests All existing unit tests pass (46 passed).

mykaul changed the title ~~(improvement) cache namedtuple class in named_tuple_factory to avoid …~~ (improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid … Mar 13, 2026

mykaul marked this pull request as draft March 13, 2026 10:13

This was referenced Mar 14, 2026

Tracking: Vector search (VectorType) performance improvement PRs #746

Open

Tracking: General (non-vector) performance improvement PRs #747

Open

(improvement) LWT prepared statement performance: analysis and improvement plan #751

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid …#740

(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid …#740
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/cache-named-tuple-factory

mykaul commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Benchmark results

Design notes

Tests

Pre-review checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Mar 13, 2026 •

edited

Loading