Reconstruct heap graph from disk graph by sam-herman · Pull Request #519 · datastax/jvector

sam-herman · 2025-09-12T22:55:08Z

Description

Multiple database projects such as C* and OpenSearch/Solr/Lucene use LSM mechanism with frequent merges.
This in turn creates a high overhead that forces us to reconstruct the entire graph from scratch upon every merge.
A more economic approach would be to pick a leading graph that was previously persisted to disk and incrementally add small graph nodes to it.

This PR makes some of the required changes to support that behavior in upstream systems such as C* and OpenSearch.

Changes

Serializable Neighbor Distance Cache - Add a serializable cache that can store the node distances within the OnDiskGraphIndex for a faster re-creation of the OnHeapGraphIndex when read back from disk. The cache is separate and is optional, therefore we can choose whether to apply it or not, without any breaking changes to the current OnDiskGraphIndex. This can later be augmented to a format if we choose to.
Help Methods And Constructors - Add helper method and constructors to facilitate easier usage by other projects with graph merge use cases.

Testing

Added tests for graph overlap and recall for reconstructed graphs.

get incremental graph generation working add serialization for NeighborsCache add ord-mapping logic from PQ vectors to RAVV add ord mapping from RAVV to graph creation Signed-off-by: Samuel Herman <sherman8915@gmail.com>

Signed-off-by: Samuel Herman <sherman8915@gmail.com>

marianotepper · 2025-10-07T20:20:16Z

Superseded by #536

sam-herman requested review from MarkWolters, jshook, marianotepper and tlwillke as code owners September 20, 2025 01:05

sam-herman mentioned this pull request Sep 23, 2025

Incremental insertion to existing graph opensearch-project/opensearch-jvector#167

Merged

2 tasks

sam-herman force-pushed the reconstruct-heap-graph-from-disk-graph branch from e30816a to 349bd07 Compare September 25, 2025 16:54

sam-herman mentioned this pull request Sep 30, 2025

New MutableGraphIndex and ImmutableGraphIndex interfaces #534

Merged

jshook and others added 2 commits October 7, 2025 08:01

support OnHeapGraphReconstruction

e043a63

get incremental graph generation working add serialization for NeighborsCache add ord-mapping logic from PQ vectors to RAVV add ord mapping from RAVV to graph creation Signed-off-by: Samuel Herman <sherman8915@gmail.com>

rebase fix

279b5aa

Signed-off-by: Samuel Herman <sherman8915@gmail.com>

sam-herman force-pushed the reconstruct-heap-graph-from-disk-graph branch from 7ba0a3a to 279b5aa Compare October 7, 2025 15:31

sam-herman added 3 commits October 7, 2025 08:45

switch interface for convert graph

8cdb9c9

Signed-off-by: Samuel Herman <sherman8915@gmail.com>

remove explicit mentioning of OnDiskGraphIndex in builder

a670cc9

Signed-off-by: Samuel Herman <sherman8915@gmail.com>

add dimension

4cb1353

Signed-off-by: Samuel Herman <sherman8915@gmail.com>

marianotepper closed this Oct 7, 2025

sam-herman deleted the reconstruct-heap-graph-from-disk-graph branch October 7, 2025 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstruct heap graph from disk graph#519

Reconstruct heap graph from disk graph#519
sam-herman wants to merge 5 commits into
datastax:mainfrom
sam-herman:reconstruct-heap-graph-from-disk-graph

sam-herman commented Sep 12, 2025 •

edited

Loading

Uh oh!

marianotepper commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sam-herman commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Testing

Uh oh!

marianotepper commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sam-herman commented Sep 12, 2025 •

edited

Loading