[diskann-garnet] Implement BIN and Q8 quantizers by metajack · Pull Request #1050 · microsoft/DiskANN

metajack · 2026-05-11T14:30:27Z

This implements the BIN and Q8 quantizers for diskann-garnet.

This PR includes the following:

Vectors are now stored as Poly<[u8], AlignOfEight> instead of Vec<T>. This means that the same type can be used for both quantized and full precision vectors.
A new GarnetQuantizer trait is inroduced which allows us to type-erase the different quantizers instead of parameterizing the index/provider types.
Two new FFI calls were added to diskann-garnet. insert() now returns a success flag which can also signal that an associated quantizer is ready for training. build_quant_table() is called by Garnet asynchronously after insert() signals readiness for training to build quantization tables, and backfill_quant_vectors is called when training is complete to quantize previously inserted vectors. These new calls allow Garnet to control the training and backfill threads.
The accessor is now called DynamicAccessor because it dynamically switches between full precision only and quantized only operation depending on availability and readiness of the associated quantizer.
A new visit_used() call was added to the FSM to facilitate walking all stored vectors.
The FSM is now lockable, which prevents it reusing IDs during quantization backfill. This means once backfilling starts, new inserts will be assigned IDs above any that need quantization.
vectorset now supports quantization and distance metric settings, and a concurrency bug with multi-threaded inserts was fixed which caused IDs to be assigned incorrectly.

There is one task left to do, which is to store and set up quantization tables when disk persistence is enabled in Garnet, but the rest should be ready for review.

Copilot

Pull request overview

This PR adds support for BIN (spherical 1-bit) and Q8 (MinMax 8-bit) quantization in diskann-garnet, introducing a type-erased quantizer interface and a “dynamic” accessor/strategy that can switch between full-precision and quantized operation as the quantizer becomes available. It also extends the vectorset tool to allow specifying quantization and distance metric during ingestion, and adds RFC documentation describing the bootstrap/backfill approach.

Changes:

Add new diskann-garnet quantizer implementations (BIN + Q8) behind a new type-erased GarnetQuantizer trait, plus FFI entrypoints for training/backfill.
Rework diskann-garnet provider/accessor/strategy to support dynamic switching and reranking behavior.
Update tooling/docs: vectorset CLI options for quantizer/metric, new RFC, and small provider/common helpers.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
vectorset/src/main.rs	Adds CLI flags for quantizer/metric and forwards them into `VADD` ingestion.
vectorset/src/loader.rs	Refactors dataset loader locking/state handling and removes unused fields.
rfcs/00000-quantizer-bootstrap.md	Adds an RFC describing quantization bootstrapping/backfill phases and provider responsibilities.
diskann-providers/src/common/mod.rs	Re-exports MinMax distance helper types used by Garnet quantization.
diskann-providers/src/common/minmax_repr.rs	Adds `from_bytes` convenience for interpreting raw bytes as MinMax elements.
diskann-garnet/src/test_utils.rs	Updates tests to match the new `GarnetProvider::new` signature (quant type parameter).
diskann-garnet/src/quantization.rs	New quantizer implementations and type-erased distance/query computer adapters.
diskann-garnet/src/provider.rs	Major provider/accessor/search strategy updates for dynamic quantization, training readiness, and backfill.
diskann-garnet/src/lib.rs	Updates index creation for quant types, changes `insert()` return type, and adds FFI for training/backfill.
diskann-garnet/src/garnet.rs	Adds (currently unused) `exists_*` helpers with explicit dead_code expectations.
diskann-garnet/src/fsm.rs	Adds `visit_used`, reuse locking, and `total_used` tracking to support safe backfill.
diskann-garnet/src/dyn_index.rs	Switches type-erased index operations to use the new dynamic strategy and exposes train/backfill hooks.
diskann-garnet/src/alloc.rs	Makes `AlignToEight` public for cross-module aligned byte container usage.
diskann-garnet/Cargo.toml	Adds `rand` and adjusts `tokio` features for new code usage.
Cargo.lock	Locks new dependency resolution (notably `rand`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hildebrandmw

Thanks Jack! It will be great to have quantization land in diskann-garnet.

After seeing the 1000-foot view, the things that stand out to me are:

The somewhat awkward representation of quantizer (Option<Box<dyn>> in GarnetProvider with RwLock<Option<_>> as the implementation).
Heavy use of conditionals throughout vector retrieval checking for quantization, especially with respect to the start point cache. This also necessitated the working-set check and drain if we switched to quantization part-way through insert.
Race conditions in the free space map and quantization.

One idea I had regarding the first two points is to lean more heavily into trait objects with something like

trait Vector {
    // The number of bytes needed per vector
    fn bytes(&self) -> usize;
    fn is_quantized(&self) -> bool;
    // Ther term to provide to `Garnet` for data retrieval.
    fn term(&self) -> Term;
    fn cached(&self, id: u32) -> Option<&[u8]>;
    fn distance_computer(&self) -> Box<dyn _>;
    fn query_computer(&self, ...) -> Box<dyn _>;
}

The idea being that both full-precision and quantized vectors can live behind the same facade. Then in the Provider, we can have:

// Naming and exact structure are just meant to describe the idea
struct GarnetProvider {
    primary: Arc<dyn Vector>,
    quant: Option<Arc<dyn Vector>>,
    ...
}

The Accessor can then be

struct Accessor<'a> {
    provider: &'a GarnetProvider,
    vectors: &'a dyn Vector,
}

With most of the non-reranking vector needs going through Accessor::vectors. When no quantization is active, then vectors = GarnetProvider::primary, otherwise, vectors = GarnetProvider::quant. Then all the vector retrieval code stays pretty much the same without needing to constantly branch on whether or not the vectors are quantized. Further, the issue with the working set goes away because vectors that start their insert as full-precision stay as full-precision for the duration of the insert.

hildebrandmw · 2026-05-11T15:45:57Z

+    /// to return the correct status at the end of insert() since the provider code that becomes
+    /// aware of the state change cannot directly communicate it back to the caller.
+    pub static QUANTIZER_READY: AtomicBool = const { AtomicBool::new(false) };
+}


Can we rely on a thread-local for this? What if the same thread is being used to search multiple indexes?

Could this be stored in the Context instead?

I could store it in context if I wrap add an internal field to context. I suppose that is slightly nicer.

I'm not sure what issue you are envisioning with the threads. The check is between the guts of insert and the return of insert. There's no way for another chunk of code to run on our thread in the meantime. If something were to happen, the bad behavior is limited to returning SUCCESS_TRAINING_READY, which will cause garnet to dispatch train_quantizer(). That getting called a second time is fine, because train_quantizer has detection for whether training has already started and exits if that's the case.

What happens, though, if there are multiple indexes. One has been had quantization bootstrapped and the other has not. If a thread inserts on the bootstrapped index and then runs on the non-boot-strapped index, the thread local will still be set to quantized. Does garnet not reuse threads for multiple indexes?

hildebrandmw · 2026-05-11T16:01:03Z

+            VectorQuantType::NoQuant | VectorQuantType::XPreQ8 => (None, 0, false),
+            VectorQuantType::Invalid => return Err(GarnetProviderError::InvalidQuantizer),
+            VectorQuantType::Q8 => {
+                if TypeId::of::<T>() != TypeId::of::<f32>() {


I don't think relying on TypeId is a good idea. It feels like ad-hoc specialization and can cause issues if we add f16 support in the future. I understand that the Redis API doesn't support f16, but it has historically proved itself to be useful. Can this be packaged behind a trait? Or use VectorRepr::as_f32?

hildebrandmw · 2026-05-11T16:04:14Z

 diskann::always_escalate!(GarnetProviderError);

+/// The Garnet DataProvider implementation.
 pub struct GarnetProvider<T: VectorRepr> {


Why parameterize by T anymore? If we ever end up varying T, a bunch of code will get re-monomorphized when what actually changes based on T is relatively small.

hildebrandmw · 2026-05-11T16:11:18Z

+        &self,
+        metric_type: Metric,
+        data: MatrixView<f32>,
+    ) -> Result<(), GarnetQuantizerError> {


Should the whole training loop be done under a lock, with training skipped if the inner quantizer is already initialized?

Otherwise, we risk overwriting the quantizer with a different one when that risk can be mitigated locally.

hildebrandmw · 2026-05-11T16:31:35Z

+
+        let mut v = vec![0f32; self.dim];
+        let mut q = vec![0u8; quantizer.canonical_bytes()];
+        for id in start_id..=end_id {


This entire loop suppresses errors rather than correcting or reporting, why is that?

hildebrandmw · 2026-05-11T16:40:33Z

+
+        let backfill_finished = self.fsm.unlock_reuse();
+
+        if backfill_finished {


This isn't a reliable signal: it races. For example, if you have two tasks: the first one gets preempted before acquiring the reuse lock. The second one then acquires the lock, processes data, and then releases the lock. This release will see the counter as zero and think everything is good, but the first task never even started.

In general, the external interaction with the free-space map feels brittle. There are many atomic variables interacting with each other. As another example, checking can_reuse in FreeSpaceMap::next_id also doesn't feel right. See my other comment.

hildebrandmw · 2026-05-11T16:42:55Z

    /// previously belonged to a deleted element. The returned ID is marked as used.
    pub fn next_id(&self, ctx: Context) -> Result<u32, FsmError> {
-        if self.has_free_ids.load(Ordering::Acquire) {
+        if self.can_reuse() && self.has_free_ids.load(Ordering::Acquire) {


Is this check correct? self.can_reuse can return true before any quantization happens, causing this check to pass. Then quantization can be initiated. This means we can be in the loop below and doing quantization. I assume this isn't the intended goal since the check exists, but is there a mechanism by which next_id broadcasts "I'm running".

hildebrandmw · 2026-05-11T17:18:55Z

+            && !quantizer.is_prepared()
+            && self.fsm.total_used() > quantizer.required_vectors()
+        {
+            QUANTIZER_READY.with(|v| v.store(true, Ordering::Release));


How does this avoid mayhem when the same thread is used to search another index?

hildebrandmw · 2026-05-11T17:33:11Z

+            };
+
+        // Ensure we don't kick off training twice.
+        match self.training_started.compare_exchange(


If training fails, training_started is never reset. Should we be able to restart training?

Jack Moffitt added 2 commits May 7, 2026 14:27

RFC for quantization bootstrapping

a0fceca

changes based on brainstorming with Mark

94050f0

metajack requested review from a team and Copilot May 11, 2026 14:30

Copilot started reviewing on behalf of metajack May 11, 2026 14:31 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

hildebrandmw reviewed May 11, 2026

View reviewed changes

Implement BIN and Q8 quantizers

b5a3759

metajack force-pushed the jackmoffitt/quant-bootstrap branch from 14b6fe9 to b5a3759 Compare May 11, 2026 20:16


		let backfill_finished = self.fsm.unlock_reuse();

		if backfill_finished {

Conversation

metajack commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

metajack commented May 11, 2026 •

edited

Loading