Skip to content

[diskann-garnet] Implement BIN and Q8 quantizers#1050

Open
metajack wants to merge 3 commits into
mainfrom
jackmoffitt/quant-bootstrap
Open

[diskann-garnet] Implement BIN and Q8 quantizers#1050
metajack wants to merge 3 commits into
mainfrom
jackmoffitt/quant-bootstrap

Conversation

@metajack
Copy link
Copy Markdown
Contributor

@metajack metajack commented May 11, 2026

This implements the BIN and Q8 quantizers for diskann-garnet.

This PR includes the following:

  • Vectors are now stored as Poly<[u8], AlignOfEight> instead of Vec<T>. This means that the same type can be used for both quantized and full precision vectors.
  • A new GarnetQuantizer trait is inroduced which allows us to type-erase the different quantizers instead of parameterizing the index/provider types.
  • Two new FFI calls were added to diskann-garnet. insert() now returns a success flag which can also signal that an associated quantizer is ready for training. build_quant_table() is called by Garnet asynchronously after insert() signals readiness for training to build quantization tables, and backfill_quant_vectors is called when training is complete to quantize previously inserted vectors. These new calls allow Garnet to control the training and backfill threads.
  • The accessor is now called DynamicAccessor because it dynamically switches between full precision only and quantized only operation depending on availability and readiness of the associated quantizer.
  • A new visit_used() call was added to the FSM to facilitate walking all stored vectors.
  • The FSM is now lockable, which prevents it reusing IDs during quantization backfill. This means once backfilling starts, new inserts will be assigned IDs above any that need quantization.
  • vectorset now supports quantization and distance metric settings, and a concurrency bug with multi-threaded inserts was fixed which caused IDs to be assigned incorrectly.

There is one task left to do, which is to store and set up quantization tables when disk persistence is enabled in Garnet, but the rest should be ready for review.

@metajack metajack requested review from a team and Copilot May 11, 2026 14:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for BIN (spherical 1-bit) and Q8 (MinMax 8-bit) quantization in diskann-garnet, introducing a type-erased quantizer interface and a “dynamic” accessor/strategy that can switch between full-precision and quantized operation as the quantizer becomes available. It also extends the vectorset tool to allow specifying quantization and distance metric during ingestion, and adds RFC documentation describing the bootstrap/backfill approach.

Changes:

  • Add new diskann-garnet quantizer implementations (BIN + Q8) behind a new type-erased GarnetQuantizer trait, plus FFI entrypoints for training/backfill.
  • Rework diskann-garnet provider/accessor/strategy to support dynamic switching and reranking behavior.
  • Update tooling/docs: vectorset CLI options for quantizer/metric, new RFC, and small provider/common helpers.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
vectorset/src/main.rs Adds CLI flags for quantizer/metric and forwards them into VADD ingestion.
vectorset/src/loader.rs Refactors dataset loader locking/state handling and removes unused fields.
rfcs/00000-quantizer-bootstrap.md Adds an RFC describing quantization bootstrapping/backfill phases and provider responsibilities.
diskann-providers/src/common/mod.rs Re-exports MinMax distance helper types used by Garnet quantization.
diskann-providers/src/common/minmax_repr.rs Adds from_bytes convenience for interpreting raw bytes as MinMax elements.
diskann-garnet/src/test_utils.rs Updates tests to match the new GarnetProvider::new signature (quant type parameter).
diskann-garnet/src/quantization.rs New quantizer implementations and type-erased distance/query computer adapters.
diskann-garnet/src/provider.rs Major provider/accessor/search strategy updates for dynamic quantization, training readiness, and backfill.
diskann-garnet/src/lib.rs Updates index creation for quant types, changes insert() return type, and adds FFI for training/backfill.
diskann-garnet/src/garnet.rs Adds (currently unused) exists_* helpers with explicit dead_code expectations.
diskann-garnet/src/fsm.rs Adds visit_used, reuse locking, and total_used tracking to support safe backfill.
diskann-garnet/src/dyn_index.rs Switches type-erased index operations to use the new dynamic strategy and exposes train/backfill hooks.
diskann-garnet/src/alloc.rs Makes AlignToEight public for cross-module aligned byte container usage.
diskann-garnet/Cargo.toml Adds rand and adjusts tokio features for new code usage.
Cargo.lock Locks new dependency resolution (notably rand).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vectorset/src/main.rs
Comment thread diskann-garnet/src/quantization.rs Outdated
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread rfcs/00000-quantizer-bootstrap.md
Comment thread rfcs/00000-quantizer-bootstrap.md
Comment thread rfcs/00000-quantizer-bootstrap.md
Comment thread rfcs/00000-quantizer-bootstrap.md
Comment thread rfcs/00000-quantizer-bootstrap.md
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jack! It will be great to have quantization land in diskann-garnet.

After seeing the 1000-foot view, the things that stand out to me are:

  • The somewhat awkward representation of quantizer (Option<Box<dyn>> in GarnetProvider with RwLock<Option<_>> as the implementation).
  • Heavy use of conditionals throughout vector retrieval checking for quantization, especially with respect to the start point cache. This also necessitated the working-set check and drain if we switched to quantization part-way through insert.
  • Race conditions in the free space map and quantization.

One idea I had regarding the first two points is to lean more heavily into trait objects with something like

trait Vector {
    // The number of bytes needed per vector
    fn bytes(&self) -> usize;
    fn is_quantized(&self) -> bool;
    // Ther term to provide to `Garnet` for data retrieval.
    fn term(&self) -> Term;
    fn cached(&self, id: u32) -> Option<&[u8]>;
    fn distance_computer(&self) -> Box<dyn _>;
    fn query_computer(&self, ...) -> Box<dyn _>;
}

The idea being that both full-precision and quantized vectors can live behind the same facade. Then in the Provider, we can have:

// Naming and exact structure are just meant to describe the idea
struct GarnetProvider {
    primary: Arc<dyn Vector>,
    quant: Option<Arc<dyn Vector>>,
    ...
}

The Accessor can then be

struct Accessor<'a> {
    provider: &'a GarnetProvider,
    vectors: &'a dyn Vector,
}

With most of the non-reranking vector needs going through Accessor::vectors. When no quantization is active, then vectors = GarnetProvider::primary, otherwise, vectors = GarnetProvider::quant. Then all the vector retrieval code stays pretty much the same without needing to constantly branch on whether or not the vectors are quantized. Further, the issue with the working set goes away because vectors that start their insert as full-precision stay as full-precision for the duration of the insert.

/// to return the correct status at the end of insert() since the provider code that becomes
/// aware of the state change cannot directly communicate it back to the caller.
pub static QUANTIZER_READY: AtomicBool = const { AtomicBool::new(false) };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on a thread-local for this? What if the same thread is being used to search multiple indexes?

Could this be stored in the Context instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could store it in context if I wrap add an internal field to context. I suppose that is slightly nicer.

I'm not sure what issue you are envisioning with the threads. The check is between the guts of insert and the return of insert. There's no way for another chunk of code to run on our thread in the meantime. If something were to happen, the bad behavior is limited to returning SUCCESS_TRAINING_READY, which will cause garnet to dispatch train_quantizer(). That getting called a second time is fine, because train_quantizer has detection for whether training has already started and exits if that's the case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens, though, if there are multiple indexes. One has been had quantization bootstrapped and the other has not. If a thread inserts on the bootstrapped index and then runs on the non-boot-strapped index, the thread local will still be set to quantized. Does garnet not reuse threads for multiple indexes?

Comment thread diskann-garnet/src/provider.rs
VectorQuantType::NoQuant | VectorQuantType::XPreQ8 => (None, 0, false),
VectorQuantType::Invalid => return Err(GarnetProviderError::InvalidQuantizer),
VectorQuantType::Q8 => {
if TypeId::of::<T>() != TypeId::of::<f32>() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think relying on TypeId is a good idea. It feels like ad-hoc specialization and can cause issues if we add f16 support in the future. I understand that the Redis API doesn't support f16, but it has historically proved itself to be useful. Can this be packaged behind a trait? Or use VectorRepr::as_f32?

diskann::always_escalate!(GarnetProviderError);

/// The Garnet DataProvider implementation.
pub struct GarnetProvider<T: VectorRepr> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why parameterize by T anymore? If we ever end up varying T, a bunch of code will get re-monomorphized when what actually changes based on T is relatively small.

&self,
metric_type: Metric,
data: MatrixView<f32>,
) -> Result<(), GarnetQuantizerError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the whole training loop be done under a lock, with training skipped if the inner quantizer is already initialized?

Otherwise, we risk overwriting the quantizer with a different one when that risk can be mitigated locally.

Comment thread diskann-garnet/src/provider.rs Outdated

let mut v = vec![0f32; self.dim];
let mut q = vec![0u8; quantizer.canonical_bytes()];
for id in start_id..=end_id {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire loop suppresses errors rather than correcting or reporting, why is that?


let backfill_finished = self.fsm.unlock_reuse();

if backfill_finished {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a reliable signal: it races. For example, if you have two tasks: the first one gets preempted before acquiring the reuse lock. The second one then acquires the lock, processes data, and then releases the lock. This release will see the counter as zero and think everything is good, but the first task never even started.

In general, the external interaction with the free-space map feels brittle. There are many atomic variables interacting with each other. As another example, checking can_reuse in FreeSpaceMap::next_id also doesn't feel right. See my other comment.

Comment thread diskann-garnet/src/fsm.rs
/// previously belonged to a deleted element. The returned ID is marked as used.
pub fn next_id(&self, ctx: Context) -> Result<u32, FsmError> {
if self.has_free_ids.load(Ordering::Acquire) {
if self.can_reuse() && self.has_free_ids.load(Ordering::Acquire) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check correct? self.can_reuse can return true before any quantization happens, causing this check to pass. Then quantization can be initiated. This means we can be in the loop below and doing quantization. I assume this isn't the intended goal since the check exists, but is there a mechanism by which next_id broadcasts "I'm running".

&& !quantizer.is_prepared()
&& self.fsm.total_used() > quantizer.required_vectors()
{
QUANTIZER_READY.with(|v| v.store(true, Ordering::Release));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this avoid mayhem when the same thread is used to search another index?

};

// Ensure we don't kick off training twice.
match self.training_started.compare_exchange(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If training fails, training_started is never reset. Should we be able to restart training?

@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from 14b6fe9 to b5a3759 Compare May 11, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants