Skip to content

(improvement) query: add Cython-aware serializer path in BoundStatement.bind()#749

Draft
mykaul wants to merge 5 commits intoscylladb:masterfrom
mykaul:perf/cython-bind-path
Draft

(improvement) query: add Cython-aware serializer path in BoundStatement.bind()#749
mykaul wants to merge 5 commits intoscylladb:masterfrom
mykaul:perf/cython-bind-path

Conversation

@mykaul
Copy link

@mykaul mykaul commented Mar 14, 2026

Summary

  • When Cython serializers (cassandra.serializers from PR (improvement) serializers: add Cython-optimized serialization for VectorType #748) are available and no column encryption policy is active, BoundStatement.bind() uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods
  • The bind loop is split into three code paths: (1) column encryption policy, (2) Cython serializers (new fast path), (3) plain Python fallback
  • The plain Python fallback also benefits by removing the per-value ColDesc construction and ce_policy check that was previously done unconditionally

Dependencies

Performance

End-to-end BoundStatement.bind() benchmarks

Measured on a single CPU core (pinned), Python 3.14, comparing the Cython serializers path vs the plain Python path:

Workload Cython (μs/op) Python (μs/op) Speedup
Vector<float, 128> (1 column) 1.6 14.6 9.4x
Vector<float, 768> (1 column) 3.6 66.1 18.2x
Vector<float, 1536> (1 column) 4.8 122.2 25.7x
10 mixed scalars (5 float + 3 int32 + 2 double) 2.1 2.7 1.3x
1 Vector<float, 1536> + 3 scalars (realistic INSERT) 5.5 145.0 26.1x

Key observations:

  • Vector columns get massive speedups (9–26x) because the Cython serializer replaces the per-element io.BytesIO loop with a pre-allocated C buffer and inline byte-swap
  • Scalar-only workloads still see a modest 1.3x improvement from eliminating per-value ColDesc namedtuple construction and ce_policy checks
  • Speedup scales with vector dimension — higher dimensions save more per-element Python dispatch overhead

Memory savings

The old bind loop created a ColDesc namedtuple (~72 bytes) for every bound value, even when no column encryption policy was set. Both new non-CE paths eliminate this entirely.

Per bind() call, for N columns:

Allocation Old path New path (Cython or plain Python) Eliminated
ColDesc namedtuples N 0 N
ce_policy and ... evaluations N 0 N
ce_policy.column_type(...) ternary N 0 N

Per VectorType column (D-dimensional float/double/int32), additionally:

Allocation Old VectorType.serialize() New Cython SerVectorType Eliminated
io.BytesIO() instance 1 0 1
Intermediate bytes per element D 0 D
buf.getvalue() final copy 1 0 1
BytesIO internal buffer reallocations ~3–4 0 (exact-size malloc) ~3–4
Python method calls in serialize loop 2D (subtype.serialize + buf.write) 0 (C-level loop) 2D

Concrete example: A prepared INSERT with 5 scalar columns + 1 Vector<float, 1536> column eliminates ~1545 transient Python objects and ~88 KB of transient allocations per bind() call, plus ~3076 Python method calls.

Design

PreparedStatement._serializers (lazy cached property)

On first access, calls make_serializers([col.type for col in self.column_metadata]) which returns a list of Cython Serializer objects (one per bind column). Returns None if:

  • Cython serializers module is not compiled/available
  • A column encryption policy is set
  • No column metadata exists

The cache is safe because column_metadata is immutable after PreparedStatement creation. Thread safety is guaranteed by the benign-race pattern (idempotent computation + atomic attribute assignment).

Three-way bind loop

if ce_policy:
    # Path 1: Column encryption — unchanged behavior, creates ColDesc per value
elif serializers is not None:
    # Path 2: Cython fast path — ser.serialize(value, proto_version)
else:
    # Path 3: Plain Python — col_spec.type.serialize(value, proto_version)

Path 2 uses zip(serializers, values, col_meta) to iterate all three in lockstep without index overhead. Path 3 removes the per-value ColDesc construction and ce_policy check from the original code.

Testing

  • All existing unit tests pass (test_query.py, test_types.py)
  • Manually verified: Cython path produces identical serialized bytes to pure Python path for Float, Int32, Double types
  • Verified None, UNSET_VALUE, short value lists, and dict binding all work correctly across all three paths
  • Verified graceful fallback when Cython module is not available (_HAVE_CYTHON_SERIALIZERS = False)

@mykaul mykaul marked this pull request as draft March 14, 2026 11:22
@mykaul mykaul requested a review from Copilot March 14, 2026 19:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional, Cython-accelerated serialization path for BoundStatement.bind() to reduce per-value overhead (especially for large VectorType columns) when Cython serializers are available and no column encryption policy is enabled.

Changes:

  • Add a new Cython cassandra.serializers extension (with .pyx + .pxd) providing Serializer implementations and a make_serializers() factory.
  • Add lazy caching of per-column serializer objects on PreparedStatement via a _serializers property.
  • Split the bind loop into three branches: column encryption policy, Cython fast path, and pure-Python fallback (with reduced per-value overhead in the fallback).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
cassandra/serializers.pyx Adds Cython serializer implementations (scalar + VectorType) and lookup/factory functions.
cassandra/serializers.pxd Exposes the Serializer cdef interface for Cython usage.
cassandra/query.py Integrates the optional Cython serializer path into PreparedStatement/BoundStatement.bind() with lazy caching and branch selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optimized parameter binding path that can leverage Cython Serializer objects (when available) to speed up BoundStatement.bind(), while preserving the existing column-encryption behavior and improving the plain-Python fallback.

Changes:

  • Add PreparedStatement._serializers lazy cache and a three-way bind loop in BoundStatement.bind() (CE policy / Cython fast path / Python fallback).
  • Introduce Cython Serializer implementations for Float/Double/Int32/Vector and a make_serializers() factory.
  • Extend unit tests to exercise the Cython-serializer bind branch via injected stub serializers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
cassandra/query.py Adds Cython-serializer availability detection, caches serializers on PreparedStatement, and selects the new bind fast path when safe.
cassandra/serializers.pyx Implements Cython serializers (including optimized VectorType) and factory/lookup helpers.
cassandra/serializers.pxd Declares the Cython Serializer interface for cross-module typing.
tests/unit/test_parameter_binding.py Adds tests for the new bind branch and error-wrapping behavior using stub serializers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Cython-aware fast path to BoundStatement.bind() so prepared statements can use cached per-column serializer objects (when available and no column encryption policy is active), reducing Python dispatch and per-value overhead during binding.

Changes:

  • Add lazy PreparedStatement._serializers cache and split BoundStatement.bind() into CE-policy, Cython-serializer, and pure-Python paths.
  • Introduce Cython Serializer implementations (including optimized VectorType serialization) and a make_serializers() factory.
  • Expand/adjust unit tests to cover the new Cython bind branch via injected stub serializers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
cassandra/query.py Adds serializer caching and a new bind fast path (plus significant formatting-only edits in the same module).
cassandra/serializers.pyx Introduces Cython serializer implementations and factory/lookup helpers.
cassandra/serializers.pxd Declares the Cython Serializer interface for cross-module use.
tests/unit/test_parameter_binding.py Adds tests to exercise the Cython bind path without requiring compiled Cython.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Cython-aware fast path to BoundStatement.bind() by reworking the bind loop into distinct branches (column encryption vs Cython serializers vs pure-Python), and introduces the Cython serializer module used by the new path.

Changes:

  • Add lazy cached PreparedStatement._serializers and update BoundStatement.bind() to use Cython Serializer objects when available and CE policy is not active.
  • Factor bind-time serialization error wrapping into a shared helper and extend wrapping to include OverflowError.
  • Add unit tests that exercise the new Cython bind branch via injected stub serializers, plus overflow wrapping coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
cassandra/query.py Adds _serializers cache on PreparedStatement, a shared bind error wrapper, and splits BoundStatement.bind() into CE / Cython / Python paths.
cassandra/serializers.pyx Introduces Cython Serializer implementations (scalar + vector) and serializer factory helpers.
cassandra/serializers.pxd Exposes the Serializer cdef interface for Cython interop.
tests/unit/test_parameter_binding.py Adds unit tests to validate the new Cython bind path behavior and error wrapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mykaul added 4 commits March 19, 2026 20:24
…torType

Add cassandra/serializers.pyx and cassandra/serializers.pxd implementing
Cython-optimized serialization that mirrors the deserializers.pyx architecture.

Implements type-specialized serializers for the three subtypes commonly used
in vector columns:
- SerFloatType: 4-byte big-endian IEEE 754 float
- SerDoubleType: 8-byte big-endian double
- SerInt32Type: 4-byte big-endian signed int32

SerVectorType pre-allocates a contiguous buffer and uses C-level byte swapping
for float/double/int32 vectors, with a generic fallback for other subtypes.
GenericSerializer delegates to the Python-level cqltype.serialize() classmethod.

Range checks for float32 and int32 values prevent silent truncation from
C-level casts, matching the behavior of struct.pack().

Factory functions find_serializer() and make_serializers() allow easy lookup
and batch creation of serializers for column types.

Benchmarks show ~30x speedup over the current io.BytesIO baseline and ~3x
speedup over Python struct.pack for Vector<float, 1536> serialization.

No setup.py changes needed - the existing cassandra/*.pyx glob already picks
up new .pyx files.
…nt.bind()

When Cython serializers (from cassandra.serializers) are available and no
column encryption policy is active, BoundStatement.bind() now uses
pre-built Serializer objects cached on the PreparedStatement instead of
calling cqltype classmethods. This avoids per-value Python method dispatch
overhead and enables the ~30x vector serialization speedup from the Cython
serializers module.

The bind loop is split into three paths:
1. Column encryption policy path (unchanged behavior)
2. Cython serializers path (new fast path)
3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check)

Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy
bind split).
- Chain original exception in _raise_bind_serialize_error (raise from exc)
- Change _check_int32_range to raise struct.error instead of OverflowError,
  matching the behaviour of struct.pack('>i', value)
- Clarify docstrings for _check_float_range/_check_int32_range
- Expand _raise_bind_serialize_error docstring with specific exception types
- Document __getitem__ requirement in vector serialize methods
- Move io and uvint_pack imports to module scope in serializers.pyx
- Add struct import to serializers.pyx for struct.error
- Fix test_plain_path_overflow_error_wrapped docstring (struct.error, not OverflowError)
- Update OverflowSerializer stub to raise struct.error
- Replace name-mangled __serializers with _cached_serializers
@mykaul mykaul force-pushed the perf/cython-bind-path branch from bb267e6 to 5d684d1 Compare March 20, 2026 08:34
@mykaul mykaul self-assigned this Mar 20, 2026
@mykaul mykaul requested a review from Copilot March 20, 2026 12:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optimized binding path that leverages Cython-backed per-column Serializer objects (when available) to reduce Python overhead in BoundStatement.bind(), while keeping existing behavior for column encryption policy and providing a streamlined pure-Python fallback.

Changes:

  • Add lazy, cached Cython serializer lookup on PreparedStatement and route BoundStatement.bind() through a 3-way path: CE-policy, Cython fast path, pure-Python fallback.
  • Introduce Cython serializer implementations (Serializer, scalar serializers, SerVectorType, and factories).
  • Extend unit tests to exercise the Cython fast-path behavior (via injected stub serializers) and UNSET handling scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
cassandra/query.py Adds optional Cython serializer import, caches per-column serializers on PreparedStatement, and refactors BoundStatement.bind() into CE/Cython/Python paths with pre-allocation.
cassandra/serializers.pyx Introduces Cython-optimized serializers (including VectorType) and factory functions for per-column serializer creation.
cassandra/serializers.pxd Declares the Cython Serializer interface for cross-module typing/cimports.
tests/unit/test_parameter_binding.py Adds tests that inject stub serializers to validate the Cython bind path and UNSET behavior across paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +226 to +231
cdef inline bytes _serialize_float(self, object values):
"""Serialize a sequence of floats into a contiguous big-endian buffer.

Note: uses index-based access (values[i]) rather than iteration, so
the input must support __getitem__ (e.g. list or tuple).
"""
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerVectorType’s optimized _serialize_float/_serialize_double/_serialize_int32 paths require getitem (values[i]) even though the pure-Python VectorType.serialize() only requires len + iteration. This means some inputs that currently serialize fine in Python (any iterable with len but without indexing) will fail only when the Cython fast path is selected. Consider using PySequence_Fast/sequence coercion (or falling back to the generic element-iteration path) so accepted input types are consistent across paths.

Copilot uses AI. Check for mistakes.
# Fill remaining unbound columns with UNSET_VALUE (v4+ feature).
# The pre-allocated list already has slots for these, so index
# assignment works directly without trimming first.
for i in range(idx, col_meta_len):
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the v4+ UNSET fill loop, the loop variable i is unused (for i in range(idx, col_meta_len):). Consider using while idx < col_meta_len: (or for _ in range(...)) to avoid an unused variable and make the intent clearer.

Suggested change
for i in range(idx, col_meta_len):
while idx < col_meta_len:

Copilot uses AI. Check for mistakes.
Comment on lines +676 to +679
message = (
'Received an argument of invalid type for column "%s". '
"Expected: %s, Got: %s; (%s)" % (col_spec.name, col_spec.type, actual_type, exc)
)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_raise_bind_serialize_error() wraps struct.error/OverflowError as well as TypeError, but the emitted message always says "invalid type". For out-of-range numeric values this is misleading (the type may be correct but the value is invalid). Consider adjusting the message (or branching on exception type) to mention an invalid value/range overflow while still including the column context.

Suggested change
message = (
'Received an argument of invalid type for column "%s". '
"Expected: %s, Got: %s; (%s)" % (col_spec.name, col_spec.type, actual_type, exc)
)
if isinstance(exc, TypeError):
# True type mismatch: the Python type of the bound value is not compatible
message = (
'Received an argument of invalid type for column "%s". '
"Expected: %s, Got: %s; (%s)"
% (col_spec.name, col_spec.type, actual_type, exc)
)
elif isinstance(exc, (struct.error, OverflowError)):
# Value is of the right general type but cannot be represented
# in the target CQL type (e.g. numeric value out of range).
message = (
'Received an out-of-range or otherwise invalid value for column "%s". '
"Expected CQL type: %s, Got value: %r (Python type: %s); (%s)"
% (col_spec.name, col_spec.type, value, actual_type, exc)
)
else:
# Fallback for any unexpected exception types that might be passed in
message = (
'Error binding value for column "%s". Expected: %s, Got: %s; (%s)'
% (col_spec.name, col_spec.type, actual_type, exc)
)

Copilot uses AI. Check for mistakes.
- Replace malloc/free with PyBytes_FromStringAndSize(NULL, ...) pattern in
  vector fast-paths, eliminating extra buffer copy and malloc(0) edge case
- Change _check_int32_range to raise OverflowError instead of struct.error,
  consistent with _check_float_range
- Remove unused imports (struct, malloc, free)
- Differentiate error messages in _raise_bind_serialize_error: 'invalid type'
  for TypeError vs 'value out of range' for OverflowError/struct.error
- Replace unused loop variable with while loop in UNSET_VALUE fill
- Expand __getitem__ docstrings explaining performance rationale
- Fix copyright header in test_parameter_binding.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants