TurboQuant: Add distortion benchmark#8070
Conversation
Merging this PR will not alter performance
|
ab18c6f to
309dcfc
Compare
|
whoops forgot to make sure the binary is always up to date (edit: fixed) |
a27a167 to
1de73c4
Compare
|
@claude review with careful attention paid to the validity of the statistical arguments. |
This comment was marked as outdated.
This comment was marked as outdated.
danking
left a comment
There was a problem hiding this comment.
I hid ChatGPT's comment because it is quite long and I think not quite correct about some statistical things, but you may find it helpful to read over it if anything I've written below is confusing.
| } | ||
|
|
||
| /// Per-vector normalized reconstruction MSE. Rows whose original squared norm is below `1e-10` | ||
| /// are dropped because their normalized error is numerically undefined. |
There was a problem hiding this comment.
I think we need to switch to the correct definition of mean square error,
There was a problem hiding this comment.
ok so I decided to just remove the "normalization" factor of the dimensions since higher dimensions should theoretically result in lower distortion anyways, so not its just sum of squared errors.
There was a problem hiding this comment.
oh wait I need to make these all unit-norm first
| decoded.slice(0..half)?, | ||
| decoded.slice(half..2 * half)?, | ||
| &mut ctx, | ||
| )?; |
There was a problem hiding this comment.
Claude mentions this as well but I think this section has a few errors.
Equation 2 from the TurboQuant paper defines the inner product error as: for all pair of vectors, x and y, in

is upper bounded by (from an unnumbered equation under "Inner Product TurboQuant"):

We need to respect the quantifier and expectation ordering. We need to fix a bit width, an x vector, and a y vector, and then compute the squared cosine distance:
d_prods = []
for seed in random_seeds:
diff = compute_cosines(y, x) - compute_cosines(y, decode(encode(x, seed)))
d_prods.append(diff * diff)
d_prod_mean = sum(d_prods) / len(d_prods)That value, d_prod_mean, in the limit of looping over all seeds, is bound by the error.
There was a problem hiding this comment.
I'm not sure how to choose y. I think choosing a set of random vectors on startup is reasonable. The property should hold for any y you choose.
Again, another unnumbered equation, this one under "1.1 Problem Definition", but this argues that if we choose y = x, then the expectation of the difference is just zero. That seems like a separately useful statistic to measure and display.

There was a problem hiding this comment.
I think it is fine to just have a bunch of random pairs of x and y, I think a specific random y and then a random set of x is fine in the expectation calculation because everything (should be) independent.
|
issue 1 is not a real issue because I say "normalized" in the chart but I adjusted it anyways, issue 2 is fake, and I just fixed issue 3 and 4 |
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
danking
left a comment
There was a problem hiding this comment.
There are still some issues with the statistics. I'm happy to jump on a zoom tomorrow morning to talk through it.
|
|
||
| //! TurboQuant distortion measurement on real vector datasets. | ||
| //! | ||
| //! Reports per-vector NMSE (`||x - x'||^2 / ||x||^2 = ||unit(x) - unit(x')||^2`) and per- |
There was a problem hiding this comment.
This equality is only true if the norm of x is equal to the norm of its reconstruction. I think we should just elide the equality.
|
|
||
| //! TurboQuant distortion measurement on real vector datasets. | ||
| //! | ||
| //! Reports per-vector NMSE (`||x - x'||^2 / ||x||^2 = ||unit(x) - unit(x')||^2`) and per- |
There was a problem hiding this comment.
A "per-vector NMSE" doesn't make sense to me. NSME is "normalized mean square error", right? I agree that
| //! Reports per-vector NMSE (`||x - x'||^2 / ||x||^2 = ||unit(x) - unit(x')||^2`) and per- | |
| //! Reports the normalized mean square error of this TurboQuant implementation over vectors from a configurable dataset [...] |
No "per-vector" stuff.
| let norm_sq: f32 = orig.iter().map(|&v| v * v).sum(); | ||
| if norm_sq == 0.0 { | ||
| return 0.0; | ||
| } |
There was a problem hiding this comment.
This gives me heebie jeebies. Any zero vector in the test data erroneously reduces our estimate of the reconstruction error.
Suppose, for example, if all my test vectors are the zero vector and all my reconstructions are the all ones vector. The reconstruction error, as defined by the paper, would be
Are there any zero vectors in the test data?
There was a problem hiding this comment.
Actually, why not just normalize the vectors after loading them? Then everything downstream can follow the paper's formalism exactly (e.g. we can just use mean square error here instead of normalizing). If any vectors are zero vectors, we just throw them out while loading.
| let decoded = decoded_ext.into_array(); | ||
| let decoded_flat = extract_flat_f32(&decoded, &mut ctx)?; | ||
|
|
||
| let reconstruction = stats(&reconstruction_nmse(&original, &decoded_flat, dim, n)); |
There was a problem hiding this comment.
This is still not equivalent to
The paper defines
I realize you're trying to replicate Figure 3. I think we can do that, but it's a less useful figure than Figure 1. Figure 1 shows the distribution of
Expectations are linear operators so your for loops can be for seed: for x: or for x: for seed:; however, there you must average over multiple seeds for your plots to fairly characterize Vortex's TurboQuant.
Summary
Tracking issue: #7830
Adds distortion / error benchmarking option for TurboQuant. Note that this benchmark is still over the old implementation of TurboQuant, but because the algorithm is identical it shouldn't affect any numbers.
We will cut over to the new version atomically once everything is implemented in
vortex-turboquant.Testing