Skip to content

Fix TurboQuant centroid initialization wasting most codes at high dimensions#8076

Closed
mprammer wants to merge 1 commit into
developfrom
mp/tq-centroid-init
Closed

Fix TurboQuant centroid initialization wasting most codes at high dimensions#8076
mprammer wants to merge 1 commit into
developfrom
mp/tq-centroid-init

Conversation

@mprammer
Copy link
Copy Markdown
Contributor

TurboQuant's Lloyd–Max scalar quantizer seeded its centroids uniformly across the full support [-1, 1], but after rotation each coordinate's marginal concentrates around 0 with standard deviation 1/sqrt(dimension). For dimension >= 256 most seeds start in the near-zero-mass tails, where the conditional-mean update's zero-denominator guard freezes them on the first iteration — so a large fraction of codes end up quantizing coordinate values that never occur. At 8 bits and dimension = 1024 that left reconstruction error at ~40x the paper's Theorem 1 bound and stopped it falling as bits increased. The fix seeds centroids on ±sqrt(bit_width) · sigma, where the mass actually is, so no code freezes.

The seed scales with sqrt(bit_width) rather than a fixed multiple of sigma because the optimal spread grows with codebook size: a constant tight enough to be optimal at low bit widths under-covers the tail at high ones, and a constant wide enough for high bit widths wastes resolution lower down. So any single constant is dominated — 2.5σ is best in the mid range, 3.0σ at the top, neither everywhere — while sqrt(bit_width)·sigma ties the best constant at every bit width and is strictly best at 8 bits. Reconstruction error as a multiple of the Theorem 1 bound at dimension = 1024, lower is better (the pattern holds across dimensions):

bits [-1,1] (before) 2.5σ 3.0σ 3.5σ √bits·σ (this PR)
4 1.15 0.89 0.89 0.89 0.89
6 3.66 0.97 1.04 1.18 0.97
8 40.6 1.25 1.21 1.49 1.17

The same seed change lands in both Lloyd–Max copies (vortex-tensor and vortex-turboquant), kept in lockstep by a cross-crate parity test, with a regression test asserting no code freezes across the dimension/bit grid and an #[ignore]d sweep_centroid_init that reproduces the table. Note the codebook itself changes: vortex-tensor arrays store their centroids inline so existing files are unaffected, but the vortex-turboquant extension path derives centroids at decode, so data written through it will dequantize against the new codebook.

🤖 Generated with Claude Code

@mprammer mprammer requested a review from connortsui20 May 24, 2026 23:33
@codspeed-hq

This comment was marked as low quality.

@mprammer mprammer added the changelog/fix A bug fix label May 24, 2026
…ensions

Seed Lloyd-Max centroids on ±sqrt(bit_width)·sigma instead of the full
support [-1, 1], so they start where the rotated-coordinate marginal has
mass and no cell freezes in the zero-mass tails. The same change lands in
both centroid implementations (vortex-tensor and vortex-turboquant), kept
in sync by the cross-crate parity test, with a regression test and an
ignored sweep harness.

Signed-off-by: mprammer <martin@spiraldb.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@connortsui20
Copy link
Copy Markdown
Contributor

If we are going to do this, can this also figure out better boundaries? These are initialization changes, but we should also change the boundaries from -1, 1 to something like -10.5sigma, 10.5sigma (following the vLLM impl)

@mprammer
Copy link
Copy Markdown
Contributor Author

We checked this — the outer boundary is inert. Holding the init fixed (3σ here, to isolate the boundary) and sweeping the edge over ±1 (support), ±9σ, and ±10.5σ gives identical distortion (d=1024, error as a multiple of the Theorem 1 bound, lower is better):

bits ±1 (support) ±9σ ±10.5σ
4 0.89 0.89 0.89
6 1.04 1.04 1.04
8 1.21 1.21 1.21

The marginal sits within a few σ of 0, so the outermost cell's conditional mean is set by the near-zero mass no matter where the far boundary lands, independent of the init. (At d=128, ±1 is already ≈11σ; at higher d the extra span carries no mass.) We left the boundaries at [-1, 1] to keep the diff minimal, but it doesn't matter either way — the init spread is the only lever that moves distortion, which is what this PR changes. The full sweep, including the vLLM 3.5σ/10.5σ config, is reproducible through the ignored sweep_centroid_init test.

@connortsui20
Copy link
Copy Markdown
Contributor

in that case, I want a proof of why we sqrt(bit width) * sigma is optimal (either as a formal mathematical proof or numerically with a chart). It is still super unclear to me why changing the initial parameters has such a massive affect on the distortion, and it is further unclear why this is supposed to be "ideal"

@mprammer mprammer marked this pull request as ready for review May 25, 2026 21:33
@danking
Copy link
Copy Markdown
Contributor

danking commented May 26, 2026

@claude give this a detailed review paying special attention to the accuracy and precision of this method.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Claude finished @danking's task in 5m 59s —— View job


PR Review: Fix TurboQuant centroid initialization

Tasks:

  • Gather context and understand request
  • Read vortex-tensor/src/encodings/turboquant/centroids.rs
  • Read vortex-turboquant/src/centroids.rs
  • Analyze mathematical correctness of centroid initialization formula
  • Check numerical precision and edge cases
  • Review test coverage
  • Post detailed review

Summary

The fix is mathematically sound and correct as a fix for the stated bug. Seeding centroids in [-1,1] stranded nearly all of them in the near-zero-mass tails at high dimensions, where the zero-denominator guard froze them permanently. The new initialization places every centroid on live probability mass, enabling proper Lloyd-Max convergence. There are, however, several issues worth discussing before merge — a few substantive and several smaller.


1. Mathematical Correctness of the Core Fix ✅

The key relationships are correct:

  • Sigma derivation: For a uniform random unit vector in R^d, each coordinate has Var(X_i) = 1/d, so sigma = 1/sqrt(d) is exact.
  • Distribution model: f(x) ∝ (1-x²)^((d-3)/2) on [-1,1] is the correct marginal for coordinates of a uniform spherical random variable on S^(d-1).
  • Centroid spacing formula: centroids[idx] = -init_half + (2·idx + 1)·init_half / N places N centroids at midpoints of N equal sub-intervals of [-init_half, init_half], which is correct uniform initialization.
  • HalfIntExponent: Euclidean division correctly handles negative numerators (d=2 case: (-1).div_euclid(2) = -1, (-1).rem_euclid(2) = 1 → exponent = −0.5 ✓).

The .min(1.0) clamp is dead code for all valid inputs (MIN_DIMENSION=128, MAX_BIT_WIDTH=8 → sqrt(8)/sqrt(128) ≈ 0.25 < 1.0), but it's harmless defensive coding.


2. The sqrt(bit_width) Coefficient — Needs Justification ⚠️

The PR claims SqrtRate(1.0) (i.e. sqrt(b)·sigma) "ties the best constant at every bit width and is strictly best at 8 bits." This is supported empirically but warrants more scrutiny.

The standard information-theoretic result for optimal Lloyd-Max initialization of a Gaussian uses a loading factor of roughly sqrt(2·ln(N)) = sqrt(2·b·ln 2) ≈ 1.177·sqrt(b). The sweep includes this as SqrtRate(1.1774) but the PR description doesn't show its results in the table — it only shows Fixed(2.5), Fixed(3.0), Fixed(3.5) vs. SqrtRate(1.0).

Request: Add the sqrt(2lnN) row (SqrtRate(1.1774)) to the table in the PR description, as it's the theoretically motivated baseline. If SqrtRate(1.0) beats it empirically, that's a noteworthy finding worth calling out explicitly — it would follow from the compact support of this distribution (the hard boundary at ±1 makes it sub-Gaussian in the tails, so a tighter loading factor is appropriate). That explanation should be in the SqrtRate doc comment, which currently only says "mirroring how a quantizer's optimal loading factor grows with rate" — a phrase that's directionally true but elides the coefficient difference.

Also: the table covers bits ∈ {4,6,8} at dim = 1024. The production range is bits ∈ {1..=8}. Bits 1–3 and 5, 7 are absent. Given that the ratio-to-bound at bits=1 and bits=2 are near-1.0 for all configs (any reasonable seed works), this is probably fine — but the claim "ties the best constant at every bit width" should ideally be supported for the entire supported range.


3. Parity Test Scope is Too Narrow ⚠️

The cross-crate parity test (vortex-turboquant/src/tests/parity.rs) only exercises d=128, b=3 — exactly the parameters where the old initialization was barely broken (at d=128, sigma ≈ 0.088, and with only 8 centroids, Lloyd-Max was still roughly functional). The failure mode this PR fixes was most severe at d≥256, b=8.

There are two structural risks this leaves open:

  1. A future change to DEFAULT_INIT_SPREAD in vortex-tensor might not be caught if the centroids at d=128, b=3 happen to converge to the same result regardless.
  2. The two hard-coded implementations can diverge silently at the high-dimensional case.

vortex-turboquant/src/centroids.rs directly inlines (f64::from(bit_width).sqrt() * sigma).min(1.0), while vortex-tensor uses the DEFAULT_INIT_SPREAD = SqrtRate(1.0) abstraction. These compute identically today, but there's no structural guarantee they stay in sync. Adding a parity test at (d=1024, b=8) would meaningfully tighten this.

// Suggested addition to parity.rs:
let config = TurboQuantConfig::try_new(8, 42, 3)?;
let input = f32_vector_array(1024, 2, 0.125, Validity::NonNullable)?;
// ...

Fix this →


4. Numerical Precision in mean_between_centroids 🔍

The trapezoidal rule uses INTEGRATION_POINTS = 1000 uniformly over [lo, hi]. Two cases deserve scrutiny:

Wide outer cells: After convergence, the outermost cells span [-1, boundary₀] and [boundary_{N-2}, 1]. At d=1024, the boundary near the distribution edge is around ±3σ ≈ ±0.09, so the outer cell covers roughly [-1, -0.09] — about 455 of the 1000 points fall in a region with essentially zero density. This isn't wrong, but it means the effective integration resolution for the conditional mean is concentrated in a narrow region. The result is still accurate because pdf_unnormalized returns ≈0 for all those tail points, but it's numerically inefficient.

Potential NaN for d=2 (not reachable at MIN_DIMENSION=128, but worth noting): pdf_unnormalized(-1.0, exponent) with int_part=-1, has_half=true computes 0.0_f64.powi(-1) * 0.0_f64.sqrt() = inf * 0.0 = NaN. The max(0.0) guard on base doesn't prevent the powi(-1) infinity. This is dead code with the current MIN_DIMENSION=128, but if that constraint were relaxed, it would silently produce NaN centroids.

f64→f32 cast in the quality measurement (measure_quantizer in tests): find_nearest_centroid(x as f32, &boundaries) discards 29 bits of the grid coordinate before lookup. The quality measurement would be slightly more accurate using f64 throughout, though in practice the error is negligible for verifying wasted == 0.


5. Test Coverage Gaps 🔍

  • centroids_within_bounds: Only (d=128, b=1) and (d=128, b=4). Missing d=1024 and b=8.
  • centroids_are_sorted and centroids_are_symmetric: No cases with d=1024 or b=8.
  • production_centroids_have_no_wasted_cells ✅: This is the key new test and is well-designed — it covers d ∈ {128,256,1024,2048} across all bit widths. This test directly encodes the invariant that was violated.
  • No golden/pinned centroid values: Tests are structural (count, order, symmetry, wasted cells) but none assert that the centroids at a specific (d, b) converge to expected values. A subtle regression to a different local optimum (e.g., after a future integration-accuracy change) would go undetected. This is a deliberate trade-off but worth acknowledging.

6. Backward Compatibility — Data Break for vortex-turboquant ⚠️

The PR description notes this but it warrants emphasis: this is a silent data-breaking change for the vortex-turboquant extension path. Centroids are not stored inline (unlike vortex-tensor), so files written with the old code will decode with the new centroids and produce silently wrong values — no error, no warning, wrong reconstruction.

The PR description correctly identifies this, but there's no comment in the source code at the relevant encode/decode site flagging the format version dependency. A code comment near the centroid derivation site explaining the implication would help future maintainers.


7. CodSpeed Benchmark Regression 📊

The reported 16.5% regression in chunked_varbinview_opt_canonical_into[(1000, 10)] is unrelated to this PR (VarBinView array canonicalization has no code path intersection with TurboQuant centroid computation). CodSpeed itself flags "Unknown Walltime execution environment detected," indicating this is a measurement artifact. This regression should not block the PR.


Minor Code Notes

  • NB: Some of these numbers were arbitrarily chosen... comment in vortex-turboquant/src/centroids.rs (line 34) predates this PR but is now misleading since the key constant (init_half) is not arbitrary. Worth a cleanup pass, though out of scope here.
  • The #[cfg_attr(not(test), allow(dead_code))] on InitSpread::Fixed in vortex-tensor is appropriate — this variant is only used by the sweep test.
  • The powi + sqrt optimization for half-integer exponents is a reasonable micro-optimization, but it would benefit from a brief comment quantifying the speedup (the current comment says "significantly faster" which is hard to verify without context).

Verdict

The bug diagnosis is correct and the fix reliably solves it. The production_centroids_have_no_wasted_cells regression test is well-targeted. The main asks before merge:

  1. Show SqrtRate(1.1774) results in the PR description table and add an explanation for why coefficient 1.0 beats the information-theoretic value for compact-support distributions.
  2. Add a parity test case at d=1024, b=8 to cover the case that was actually broken.
  3. (Optional but recommended) Add a comment or migration note at the vortex-turboquant decode site about the codebook change.

Copy link
Copy Markdown
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a correct improvement to TurboQuant but it changes the interpretation of already written vectors. I think this is OK as long as no releases of Vortex have been made since the original TQ PR merged.

@danking
Copy link
Copy Markdown
Contributor

danking commented May 26, 2026

Here's my reasoning for approving:

  1. Fix TurboQuant centroid initialization wasting most codes at high dimensions #8076 only changes to where the centroids are initialized. The algorithm still searches for an optimal quantization within [-1, 1]. The initial bins are kinda weird: the first and last bin can be much larger than the other bins: because they cover the gaps between [-1, 1] and the interval from which we sample. I think this is OK, if there happens to be a ton of mass near -1, the first centroid will be pulled towards that mass. The second centroid will then shift to fill that vacuum.

  2. This Stanford course (EE269) argues that you should derive an analytical approximation of the CDF, sample some points in Unif[-1, 1] and send them through CDF^-1 to get decent starting centroids. What Fix TurboQuant centroid initialization wasting most codes at high dimensions #8076 does is use a Uniform distribution on a truncated interval. That's also an analytical approximation (albeit not a great one!). https://web.stanford.edu/class/ee269/Lecture_nonuniform_quantization.pdf

I also think Claude is wrong to bring up sqrt(2 ln N). AFAICT, what it is saying is true for the Normal distribution. In particular, you need to truncate the Normal distribution's support from (-inf, inf) to something reasonable. Claude may be correct that setting the boundaries to sqrt(2 ln N) is "optimal" under some settings for the Normal; however, that's a distraction. We've already chosen the interval: [-1, 1]. As I describe above, we're just trying to choose a decent initialization. And as Martin points out, the optimizer gets stuck if the initial points land in a "too flat" section of the distribution.

@connortsui20
Copy link
Copy Markdown
Contributor

Id like to have a different PR that only changes the centroid initialization, rather than also have this init spread thing that I still don't understand.

@connortsui20
Copy link
Copy Markdown
Contributor

closing in favor of #8116. I don't understand the IntiSpread construct, but maybe it is something we want to have (I think it would be good to have a diagram explanation)?

Regardless, it is in the old vortex-tensor implementation only so merging this wouldn't do anything anyways

connortsui20 added a commit that referenced this pull request May 27, 2026
## Summary

Tracking issue: #7830

Followup of #8076

Changes the centroid initialization for `vortex-turboquant` to use `+-
sqrt(bit_width) * sigma` instead of just [-1, 1]

## Testing

```sh
uv run benchmarks/vector-search-bench/scripts/plot-turboquant-distortion.py \
            --dataset cohere-small-100k:single \
            --dataset openai-medium-500k:single \
            --dataset bioasq-medium-1m \
            --dataset glove-small-100k \
            --dataset gist-small-100k \
            --dataset sift-small-500k \
            --output ~/Downloads/distortion-sqrt-sigma.png
```

<img width="2816" height="950" alt="distortion-3-sigma"
src="https://github.com/user-attachments/assets/cb2db4eb-13a7-4083-949b-ee7f04ab7428"
/>

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants