Skip to content

Conversation

@sam-herman
Copy link
Contributor

@sam-herman sam-herman commented Sep 12, 2025

Description

Multiple database projects such as C* and OpenSearch/Solr/Lucene use LSM mechanism with frequent merges.
This in turn creates a high overhead that forces us to reconstruct the entire graph from scratch upon every merge.
A more economic approach would be to pick a leading graph that was previously persisted to disk and incrementally add small graph nodes to it.

This PR makes some of the required changes to support that behavior in upstream systems such as C* and OpenSearch.

Changes

  1. Serializable Neighbor Distance Cache - Add a serializable cache that can store the node distances within the OnDiskGraphIndex for a faster re-creation of the OnHeapGraphIndex when read back from disk. The cache is separate and is optional, therefore we can choose whether to apply it or not, without any breaking changes to the current OnDiskGraphIndex. This can later be augmented to a format if we choose to.
  2. Help Methods And Constructors - Add helper method and constructors to facilitate easier usage by other projects with graph merge use cases.

Testing

Added tests for graph overlap and recall for reconstructed graphs.

jshook and others added 2 commits October 7, 2025 08:01
get incremental graph generation working
add serialization for NeighborsCache
add ord-mapping logic from PQ vectors to RAVV
add ord mapping from RAVV to graph creation

Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
@sam-herman sam-herman force-pushed the reconstruct-heap-graph-from-disk-graph branch from 7ba0a3a to 279b5aa Compare October 7, 2025 15:31
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
@marianotepper
Copy link
Collaborator

Superseded by #536

@sam-herman sam-herman deleted the reconstruct-heap-graph-from-disk-graph branch October 7, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants