Skip to content

TurboQuant: Add distortion benchmark#8070

Open
connortsui20 wants to merge 3 commits into
developfrom
ct/tq-error
Open

TurboQuant: Add distortion benchmark#8070
connortsui20 wants to merge 3 commits into
developfrom
ct/tq-error

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented May 22, 2026

Summary

Tracking issue: #7830

Adds distortion / error benchmarking option for TurboQuant. Note that this benchmark is still over the old implementation of TurboQuant, but because the algorithm is identical it shouldn't affect any numbers.

We will cut over to the new version atomically once everything is implemented in vortex-turboquant.

Testing

distortion

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 1266 untouched benchmarks


Comparing ct/tq-error (11b5de1) with develop (a2323f1)

Open in CodSpeed

@connortsui20 connortsui20 force-pushed the ct/tq-error branch 2 times, most recently from ab18c6f to 309dcfc Compare May 22, 2026 20:26
@connortsui20
Copy link
Copy Markdown
Contributor Author

connortsui20 commented May 22, 2026

whoops forgot to make sure the binary is always up to date (edit: fixed)

@connortsui20 connortsui20 enabled auto-merge (squash) May 22, 2026 21:15
@connortsui20 connortsui20 force-pushed the ct/tq-error branch 2 times, most recently from a27a167 to 1de73c4 Compare May 27, 2026 08:53
@connortsui20 connortsui20 mentioned this pull request May 27, 2026
8 tasks
@danking
Copy link
Copy Markdown
Contributor

danking commented May 27, 2026

@claude review with careful attention paid to the validity of the statistical arguments.

@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hid ChatGPT's comment because it is quite long and I think not quite correct about some statistical things, but you may find it helpful to read over it if anything I've written below is confusing.

}

/// Per-vector normalized reconstruction MSE. Rows whose original squared norm is below `1e-10`
/// are dropped because their normalized error is numerically undefined.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to switch to the correct definition of mean square error, $\frac{1}{N} \sum_i (x_i - \hat{x_i})^2$. This also means we don't need to elide low magnitude rows, because MSE is well defined regardless of norm.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so I decided to just remove the "normalization" factor of the dimensions since higher dimensions should theoretically result in lower distortion anyways, so not its just sum of squared errors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait I need to make these all unit-norm first

Comment thread benchmarks/vector-search-bench/src/distortion.rs Outdated
decoded.slice(0..half)?,
decoded.slice(half..2 * half)?,
&mut ctx,
)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude mentions this as well but I think this section has a few errors.

Equation 2 from the TurboQuant paper defines the inner product error as: for all pair of vectors, x and y, in $R^d$, the following quantity
Image
is upper bounded by (from an unnumbered equation under "Inner Product TurboQuant"):
Image

We need to respect the quantifier and expectation ordering. We need to fix a bit width, an x vector, and a y vector, and then compute the squared cosine distance:

d_prods = []
for seed in random_seeds:
    diff = compute_cosines(y, x) - compute_cosines(y, decode(encode(x, seed)))
    d_prods.append(diff * diff)

d_prod_mean = sum(d_prods) / len(d_prods)

That value, d_prod_mean, in the limit of looping over all seeds, is bound by the error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to choose y. I think choosing a set of random vectors on startup is reasonable. The property should hold for any y you choose.

Again, another unnumbered equation, this one under "1.1 Problem Definition", but this argues that if we choose y = x, then the expectation of the difference is just zero. That seems like a separately useful statistic to measure and display.
Image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to just have a bunch of random pairs of x and y, I think a specific random y and then a random set of x is fine in the expectation calculation because everything (should be) independent.

Comment thread benchmarks/vector-search-bench/src/distortion.rs Outdated
Comment thread benchmarks/vector-search-bench/scripts/plot-turboquant-distortion.py Outdated
@connortsui20
Copy link
Copy Markdown
Contributor Author

issue 1 is not a real issue because I say "normalized" in the chart but I adjusted it anyways, issue 2 is fake, and I just fixed issue 3 and 4

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 enabled auto-merge (squash) May 27, 2026 16:51
@connortsui20 connortsui20 requested a review from danking May 27, 2026 16:52
Copy link
Copy Markdown
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some issues with the statistics. I'm happy to jump on a zoom tomorrow morning to talk through it.


//! TurboQuant distortion measurement on real vector datasets.
//!
//! Reports per-vector NMSE (`||x - x'||^2 / ||x||^2 = ||unit(x) - unit(x')||^2`) and per-
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This equality is only true if the norm of x is equal to the norm of its reconstruction. I think we should just elide the equality.


//! TurboQuant distortion measurement on real vector datasets.
//!
//! Reports per-vector NMSE (`||x - x'||^2 / ||x||^2 = ||unit(x) - unit(x')||^2`) and per-
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "per-vector NMSE" doesn't make sense to me. NSME is "normalized mean square error", right? I agree that $||x - x'||^2 / ||x||^2$ is a normalized squared error, but there's no mean/expectation. I think the correct verbiage is:

Suggested change
//! Reports per-vector NMSE (`||x - x'||^2 / ||x||^2 = ||unit(x) - unit(x')||^2`) and per-
//! Reports the normalized mean square error of this TurboQuant implementation over vectors from a configurable dataset [...]

No "per-vector" stuff.

let norm_sq: f32 = orig.iter().map(|&v| v * v).sum();
if norm_sq == 0.0 {
return 0.0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me heebie jeebies. Any zero vector in the test data erroneously reduces our estimate of the reconstruction error.

Suppose, for example, if all my test vectors are the zero vector and all my reconstructions are the all ones vector. The reconstruction error, as defined by the paper, would be $\sqrt{d}$! That's huge! But this function reports the reconstruction error as zero!

Are there any zero vectors in the test data?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why not just normalize the vectors after loading them? Then everything downstream can follow the paper's formalism exactly (e.g. we can just use mean square error here instead of normalizing). If any vectors are zero vectors, we just throw them out while loading.

let decoded = decoded_ext.into_array();
let decoded_flat = extract_flat_f32(&decoded, &mut ctx)?;

let reconstruction = stats(&reconstruction_nmse(&original, &decoded_flat, dim, n));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not equivalent to $D_{mse}$ from the paper. This is, given some specific seed,

$\mathbb{E}_{x \in \mathbb{R}^d}[||x - Q^{-1}(Q(x, \textrm{seed}))||^2]$

The paper defines $D_{mse}$ as, for some specific vector x:

$\mathbb{E}_{\mathrm{seed} \in R}[||x - Q^{-1}(Q(x, \textrm{seed}))||^2]$

I realize you're trying to replicate Figure 3. I think we can do that, but it's a less useful figure than Figure 1. Figure 1 shows the distribution of $D_{prod}$ over multiple bit widths and over all pairings of the 100,000 data vectors and the 1,000 query vectors. I'm not sure why the paper lacks a similar figure for $D_{mse}$. That seems to me a very reasonable figure to generate. Regardless, if you want to construct Figure 3, it should be constructed according to this:

$E_{x \in \mathbb{R}^d} [ \mathbb{E}_{\mathrm{seed} \in R}[||x - Q^{-1}(Q(x, \textrm{seed}))||^2]]$

Expectations are linear operators so your for loops can be for seed: for x: or for x: for seed:; however, there you must average over multiple seeds for your plots to fairly characterize Vortex's TurboQuant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants