TurboQuant encoding for Vectors#7269
Conversation
fb6bbcf to
3d7dfed
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
|
I would say this is ready for review now, but only with respect to the structure. I have yet to go through the implementation and make sure things make sense, but I have made it so the structure makes sense and we correctly handle the different floating point type inputs as well as null vectors. I think it would be good to get a review now, and maybe we should just merge this and iterate later. |
|
I'll rebase and fix the errors tomorrow |
lwwmanning
left a comment
There was a problem hiding this comment.
Detailed Review — TurboQuant Encoding (RFC 0033 Stage 1)
This PR implements the TurboQuant lossy vector quantization algorithm (arXiv:2504.19874) as a new encoding in the vortex-tensor crate. It is intended to be Stage 1 of RFC 0033. Overall this is well-engineered work — the algorithm is correctly implemented, the code follows Vortex conventions, tests are thorough, and the documentation is excellent. Requesting changes for a few RFC compliance issues.
RFC 0033 Stage 1 Compliance
| RFC Requirement | Status | Details |
|---|---|---|
| QJL support removed | ✅ | No QJL code present |
| 4 slots (codes, norms, centroids, rotation_signs) | ✅ | Exactly 4 slots |
| Scheme default: 5-bit MSE-only | ❌ | Default is 4-bit (compress.rs:44) |
| Norms dtype: same-or-wider (f64→f64, f32/f16→f32) | ✅ | Correct |
| Scheme minimum: dimension ≥ 128 | ❌ | TurboQuantScheme::matches() accepts dimension ≥ 3 |
| Metadata: protobuf for forward compat | Raw single byte, not protobuf — should be switched now to avoid a migration path later |
Strengths
- Clean architecture. Module structure follows established Vortex patterns (vtable macro, data struct with
try_new/new_unchecked, named slot enum, separate compress/decompress). Consistent with BitPacked, RLE, Sparse, etc. - Correct SRHT implementation. The 3-round structured random Hadamard transform is correctly implemented with XOR-based branchless sign application (auto-vectorizes to
vpxor/veor), iterative Walsh-Hadamard butterfly, and proper normalization factor1/(n·√n). Forward/inverse symmetry is verified by tests. - Thorough validation.
TurboQuantData::validate()checks codes dtype, norms dtype matching, centroids power-of-2 constraint, rotation signs length, and degenerate/empty invariants. Debug assertions innew_uncheckedadd an extra safety net. - Smart compute pushdowns. Slice/take operate on per-row children (codes, norms) and clone shared children (centroids, rotation_signs). Quantized cosine similarity and dot product avoid full decompression. L2 norm readthrough is O(1) from stored norms.
- Excellent test coverage (911 lines): roundtrip, MSE quality bounds, edge cases, nullable vectors, serde roundtrip, compute pushdowns, L2 norm readthrough.
- Good documentation. Module docs include theoretical MSE bounds, compression ratio tables, and a working example. Per-function docs explain algorithmic context.
Key Issues (see inline comments)
- Default bit_width should be 5, not 4 — RFC specifies "5-bit MSE-only (32 centroids)"
- Scheme minimum should be dimension ≥ 128 — RFC specifies auto-selection only for d ≥ 128
- Metadata serialization should use protobuf — raw byte can't be extended backward-compatibly;
FixedShapeTensorin the same crate uses prost - Unresolved TODO in
f32_to_t— should be resolved or documented before merge - Cosine similarity doc claims "same rotation" — but doesn't validate this; should clarify assumptions
Algorithmic Note
The theoretical MSE bound (Theorem 1 in the paper) is proved for Haar-distributed random orthogonal matrices, not SORF/SRHT. The SRHT is a practical approximation. The RFC explicitly acknowledges this. The tests empirically validate the bound holds with SRHT, which is good — but worth noting in the module docs.
Minor Items
new_uncheckedispub— other encodings usepub(crate)- No f16 input roundtrip test
- No quantized dot product test (only cosine similarity)
- No
TurboQuantScheme::compress()integration test - Global centroid cache is unbounded (fine in practice, worth documenting)
Generated by Claude Code
vortex-tensor/src/encodings/turboquant/compute/cosine_similarity.rs
Outdated
Show resolved
Hide resolved
vortex-tensor/src/encodings/turboquant/compute/cosine_similarity.rs
Outdated
Show resolved
Hide resolved
lwwmanning
left a comment
There was a problem hiding this comment.
Follow-up: after further consideration, the default bit_width should be 8 (near-lossless) rather than the RFC's 5. At 8 bits the normalized MSE is ~4e-5 — effectively transparent — while still achieving 3-4x compression on f32 data. This is a safer default for a general-purpose encoding; users who want more aggressive compression can explicitly configure lower bit widths.
Generated by Claude Code
lwwmanning
left a comment
There was a problem hiding this comment.
Claude diffed against RFC 33 and had a few minor comments that look valid/easy to fix before merging
|
So based on my reading of the review comments, the things we actually want to change are:
Everything else is either wrong or something we can think about later. |
Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Supports both MSE-optimal and inner-product-optimal (Prod with QJL correction) variants at 1-8 bits per coordinate. Key components: - Single TurboQuant array encoding with optional QJL correction fields, storing quantized codes, norms, centroids, and rotation signs as children. - Structured Random Hadamard Transform (SRHT) for O(d log d) rotation, fully self-contained with no external linear algebra library. - Max-Lloyd centroid computation on Beta(d/2, d/2) distribution. - Approximate cosine similarity and dot product compute directly on quantized arrays without full decompression. - Pluggable TurboQuantScheme for BtrBlocks, exposed via WriteStrategyBuilder::with_vector_quantization(). - Benchmarks covering common embedding dimensions (128, 768, 1024, 1536). Also refactors CompressingStrategy to a single constructor, and adds vortex_tensor::initialize() for session registration of tensor types, encodings, and scalar functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Will Manning <will@willmanning.io> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
We are going to implement this later as a separate encoding (if we decide to implement it at all because word on the street is that the MSE + QJL is not actually better than MSE on its own). Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
It doesn't really make a lot of sense for us to define this as an encoding for `FixedSizeList`. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
- Use ExecutionCtx in TurboQuant compress path and import ExecutionCtx - Extend dtype imports with Nullability and PType to support extension types - Wire in extension utilities: extension_element_ptype and extension_list_size for vector extensions - Remove dimension and bit_width from slice/take compute calls to rely on metadata - Update TurboQuant mod docs to mention VortexSessionExecute - Change scheme.compress to use the provided compressor argument (not _compressor) - Add an extensive TurboQuant test suite (roundtrip, MSE bounds, edge cases, f64 input, serde roundtrip, and dtype checks) - Align vtable imports to new metadata handling (remove unused DeserializeMetadata/SerializeMetadata references) Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
77a4288 to
3a2b169
Compare
|
Comments addressed. There is a small issue related to #7269 (comment) This is what my claude instance told me after seeing pretty high errors: DetailsThe theoretical guarantee. Theorem 1's MSE bound ((sqrt(3) * pi / 2) / 4^b) is proved for Haar-distributed random orthogonal matrices — matrices drawn uniformly from the orthogonal group O(d). The proof depends on two properties that Haar matrices give
With SRHT, the rotated coordinates are approximately Beta-distributed. The centroids are slightly suboptimal for the actual marginals.
With SRHT, coordinates are uncorrelated but have different higher-order dependence structure. The 3-round structure (H·D₃·H·D₂·H·D₁) gives good mixing but it's not equivalent to full randomness. What we gain in exchange: O(d log d) computation and O(d) storage instead of O(d²) for both. The SRHT butterfly + XOR sign application auto-vectorizes into SIMD. For d=1024, that's a ~1000x speedup over a full matrix multiply. In practice the gap is small. Our tests show:
The RFC's fallback plan if SRHT ever proves insufficient is to use a full B×B random orthogonal matrix per block at Stage 2 block sizes (e.g., B=256 → 256KB storage per block) We likely want to implement both (doesn't seem to be too hard) and then compare them later. |
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
3a2b169 to
5285a13
Compare
Continuation of #7167, authored by @lwwmanning
Summary
Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Implements the MSE-only variant (Stage 1 of RFC 0033) at 1-8 bits per coordinate (0 for empty arrays), defaulting to 8-bit near-lossless compression (but still a small amount because we use SRHT instead of random orthogonal rotation matrix, something about not satisfying the Haar assumption?).
Key components:
ScalarFnexpression for the rotations.O(d log d)rotation, fully self-contained with no external linear algebra library. This is what claude came up with, but we can see in testing that while this is practical and more efficient, we lose some of the assumptions that a Haar-random orthogonal matrix gives us. I think this is something we can play around with because it's abstracted into a discrete step of the algorithm.TurboQuantSchemefor the cascading compressor.TurboQuant::MIN_DIMENSION) for SRHT quality guarantees.f32).vortex_tensor::initialize()for session registration of tensor types, encodings, and scalar functions.API Changes
TurboQuantencoding invortex-tensorwithturboquant_encode()andTurboQuantConfig, and new typesTurboQuantDataandTurboQuantArray.TurboQuantSchemefor compressor integration.TurboQuant::MIN_DIMENSION(128) constant.float_from_f32<T: Float + FromPrimitive>shared helper for infallible f32-to-float conversion.Testing (claude-generated)