Skip to content

perf: reduce memory use when splitting IVF partitions#6687

Open
wjones127 wants to merge 7 commits intomainfrom
6378-split-index-ram
Open

perf: reduce memory use when splitting IVF partitions#6687
wjones127 wants to merge 7 commits intomainfrom
6378-split-index-ram

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 commented May 5, 2026

We improve memory use when splitting IVF partitions in two of the stages:

  1. Training new IVF centroids. Currently, all raw vectors to partition to split are loaded into memory and used to train the centroids. This could be 400MB for a 3072 f32 vector dataset when partition size has reached 33k, triggering a split. We now just sample 512 of the vectors, which should be sufficient to train for just 2 centroids.
  2. Shuffle. Currently, all vectors that will be moved across all partitions being split are held in memory simultaneously in Vec<SplitPlan>. This is the largest source of peak memory use currently. If many partitions are being split, this can be > 100GB. We now instead stream these raw vectors through the partition assignment and quantization pipeline, just like we do in the case of new indices.

This PR also adds progress reporting to optimize_indices, to make this more observable.

Test Workload: IVF_PQ append on 560K base rows (16 partitions, 3072-dim float32 vectors) with 160K new rows — triggers partition splitting since each partition exceeds the 32K row threshold.

Peak RSS: 26.2 GB before, 4.1 GB after.
Runtime: 93s before, 16.5s after — 5.6x faster as well

memory_comparison

Closes #6378

@wjones127 wjones127 added the indexes Related to secondary index implementations label May 5, 2026
wjones127 and others added 5 commits May 5, 2026 12:38
The optimize/append path created `IvfIndexBuilder` with
`NoopIndexBuildProgress`, so progress callbacks were silently
ignored. This adds a `progress` field to `OptimizeOptions` and
passes it through to the builder in all index type variants of
`optimize_vector_indices_v2`. Also adds shuffle stage reporting
in `shuffle_data()`.

Ref #6378

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…signment

Previously, `build_split_plan` loaded all raw vectors for every
partition being split, and ran up to `num_cpus` partitions in
parallel. For high-dimensional vectors this caused OOM. Similarly,
`collect_candidate_moves` loaded neighbor partitions in parallel.

This splits the work into two phases:
- Training (parallel, low memory): sample 512 row IDs per partition,
  load only those vectors, train kmeans.
- Assignment (sequential, high memory): load full raw vectors one
  partition at a time. Candidate moves also run sequentially.

Ref #6378

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, splitting oversized IVF partitions during index optimization
loaded all raw vectors for every split partition and their neighbors into
memory simultaneously (~11.5 GB for 30 partitions at 3072 dims). This
refactors the split path to reuse the existing streaming shuffle
infrastructure: train new centroids from samples, then stream affected
partition vectors through the IVF+quantizer transform pipeline into temp
files on disk. Peak memory drops from O(all split vectors) to O(one
batch).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Simplify three-way partition routing by extracting split_reader once
- Remove dead AssignOp::Remove variant and simplify build_assign_batch
- Add Debug impl for PartitionAdjustment
- Add SPLIT_SAMPLE_SIZE constant for kmeans training sample size
- Include partition index in "centroid not found" error message

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wjones127 wjones127 force-pushed the 6378-split-index-ram branch from 57cd64b to efc816e Compare May 5, 2026 19:42
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 72.52396% with 86 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index/vector/builder.rs 73.44% 55 Missing and 22 partials ⚠️
rust/lance/src/index/vector/ivf.rs 50.00% 5 Missing ⚠️
rust/lance-index/src/optimize.rs 69.23% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread rust/lance/src/index/vector/builder.rs Outdated
wjones127 and others added 2 commits May 5, 2026 15:03
Extract `apply_centroid_splits` from `compute_split_centroids` to make
the centroid ordering logic directly testable. Add a unit test verifying
that K simultaneous splits on N partitions produce N+K centroids with
unchanged partitions at their original indices and centroid2s appended in
split order.

Replaces the removed `finalize_split_plans_reassigns_filtered_centroid_ids`
test. The other two removed tests' properties are now covered structurally
(global nearest-centroid assignment) and by existing integration tests
(`test_split_multiple_partitions_in_one_optimize`,
`test_partition_split_on_append_multivec`).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wjones127 wjones127 marked this pull request as ready for review May 5, 2026 23:52
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

indexes Related to secondary index implementations performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimizing indexes with splitting could use a lot of RAM

1 participant