Skip to content

Concurrent graph modifications#321

Draft
razdoburdin wants to merge 16 commits into
intel:mainfrom
razdoburdin:seqlock
Draft

Concurrent graph modifications#321
razdoburdin wants to merge 16 commits into
intel:mainfrom
razdoburdin:seqlock

Conversation

@razdoburdin
Copy link
Copy Markdown
Contributor

This PR introduces concurrent graph modifications with seqlock pattern.

@razdoburdin razdoburdin marked this pull request as draft April 27, 2026 15:58
Copy link
Copy Markdown
Member

@rfsaliev rfsaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it would be better to keep both synchronized and non-synchronized graphs.
For example, synchronization is not needed for static Vamana index, but gives overhead.

using index_type = Idx;
using value_type = std::span<Idx>;
using const_value_type = std::span<const Idx>;
using const_value_type = AtomicSpan<const Idx>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better flexibility and allow user to select syncronized/non-syncronized index kind, I would define a dedicated SyncronizedGraphBase class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this duplication is really required. Performance penalty for synchronized vs non-synchronized search is only few precents.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @razdoburdin, is duplication too bad in this case? Otherwise, @rfsaliev may have a point on flexibility and it's always better if performance is not affected. But agree with you that pros and cons should be discussed if duplication is a large overhead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to investigate the trade-offs, after the finalization of design of concurrent path. Let's make sure it works well first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get valuable results, I would recommend to benchmark 'static' VamanaIndex with and without synchronized graph on different platforms - especially on multi-socket systems.

if (is_deleted(dst)) {
const auto& others = graph_.get_node(dst);
all_candidates.insert(others.begin(), others.end());
// SeqLock retry: a concurrent consolidate may be writing dst's
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphConsolidator class is parametrized by a graph type, so we can keep both syncronized/non-synchronized behavior by detecting graph type.

@mihaic
Copy link
Copy Markdown
Member

mihaic commented May 2, 2026

While testing this PR in Redis, I initially removed all locking; that made a test invoking compact and delete_entries fail (runGCParallel). Claude says:

It was a specific interaction between delete_entries (which marks nodes as Deleted via atomic CAS) and compact() (which remaps adjacency lists). When compact() builds its old_to_new_id_map from only Valid nodes, any node that gets marked Deleted concurrently won't be in that map. If a Valid node's adjacency list references the now-Deleted node, old_to_new_id_map.at() throws "Couldn't find key."

Tests do pass when search locks are removed.

@mihaic
Copy link
Copy Markdown
Member

mihaic commented May 28, 2026

@razdoburdin, I encountered another crash in Redis. What do you think about the following analysis from Claude?

Root Cause

Race condition between compact() and concurrent delete_entries() during
GC.

The crash path (in dynamic_index.h:977):

  1. runGC() calls consolidate() — prunes currently-Deleted nodes from adjacen
    cy lists
  2. Concurrently, deleteVector() or updateSVSIndex() phase 4 calls delet e_entries() → atomically marks node X as Deleted (was Valid during consolidate,
    so X is still referenced in other nodes' adjacency lists)
  3. runGC() calls compact():
    • nonmissing_indices() collects only Valid slots → node X is excluded (it's now Deleted)
    • old_to_new_id_map doesn't contain X
    • When remapping adjacency lists, valid node Y still references X → old_to_new_id_map.at(X) throws "Couldn't find key"

The translator_mutex_ doesn't help because:

  • delete_entries() holds only a shared lock (line 842)
  • compact() holds no lock during the parallel_for graph remapping (lines 959-984)
  • Only the finishing steps (line 1001) take an exclusive lock

Fix

The minimal fix is making compact() resilient to concurrently-deleted neighbors. A neighbor missing from old_to_new_id_map was concurrently deleted — dropping it from the adjacency list is semantically correct (the next GC cycle would do the same via consolidate).

Now let me also check if the entry_point_ remap in the finishing steps has the same issue (line 1025).

The entry_point remap at line 1025 is inside the exclusive lock so it's safe from new concurrent deletes at that point. However, the same race could affect it if the entry_point was deleted between consolidate and nonmissing_indices.

Summary

The crash: svs::threads::ThreadingException — "Couldn't find key" thrown from compact() in dynamic_index.h:977 during SVS GC.

Root cause: A race condition between compact() (GC) and concurrent delete_entries() (from deleteVector or updateSVSIndex phase 4). After mainIndexGuard was removed:

  1. consolidate() prunes adjacency lists of currently-Deleted nodes
  2. A concurrent delete marks node X as Deleted (was Valid during consolidate, still in valid nodes' adjacency lists)
  3. compact() builds old_to_new_id_map from only Valid nodes (X excluded)
  4. When remapping adjacency lists: old_to_new_id_map.at(X) throws because X isn't in the map

Fix: Changed the std::transform + .at() in compact() to use .find() and skip neighbors not in the map. A missing neighbor is a concurrently-deleted node — dropping it is correct (equivalent to what the next consolidate() would do). The graph remains structurally sound; it simply has slightly fewer edges on the affected nodes until the next full GC cycle.

Proposed change:

diff --git a/include/svs/index/vamana/dynamic_index.h b/include/svs/index/vamana/dynamic_index.h
index 5cfcc7d..7e185ef 100644
--- a/include/svs/index/vamana/dynamic_index.h
+++ b/include/svs/index/vamana/dynamic_index.h
@@ -966,17 +966,17 @@ class MutableVamanaIndex {
                         auto old_id = new_to_old_id_map[new_id];
 
                         const auto& list = graph_.get_node(old_id);
-                        buffer.resize(list.size());
-
-                        // Transform the adjacency list from old to new.
-                        std::transform(
-                            list.begin(),
-                            list.end(),
-                            buffer.begin(),
-                            [&old_to_new_id_map](Idx old_id) {
-                                return old_to_new_id_map.at(old_id);
+                        buffer.clear();
+                        buffer.reserve(list.size());
+
+                        // Remap adjacency list, dropping neighbors that were
+                        // concurrently deleted after consolidate ran.
+                        for (auto neighbor : list) {
+                            auto it = old_to_new_id_map.find(neighbor);
+                            if (it != old_to_new_id_map.end()) {
+                                buffer.push_back(it->second);
                             }
-                        );
+                        }
 
                         temp_graph.replace_node(batch_id, buffer);
                     }

@razdoburdin
Copy link
Copy Markdown
Contributor Author

@razdoburdin, I encountered another crash in Redis. What do you think about the following analysis from Claude?

The problem was deeper. I have made some changes, search isn't affected, but calls of compact() is now serialized vs additions, deletions and consolidations.

@mihaic
Copy link
Copy Markdown
Member

mihaic commented May 30, 2026

@razdoburdin, another crash with VecSim update threshold 1 (not with the default of 1024). Here is what I got from Claude:

Root Cause: Race condition in SVS data_.resize() during concurrent search

The crash is a use-after-free in blocks_ (a std::vector<DenseArray>).

The Data Structure

include/svs/core/data/simple.h:955:

std::vector<array_type> blocks_;   // line 955

Data access (get_datum, line 837-839):

auto [block_id, data_id] = resolve(i);
return getindex(blocks_, block_id).slice(data_id);  // dereferences blocks_[block_id]

The Race

Thread A (search) — no lock held:

  1. Calls get_datum(i) → reads blocks_.data() to index into the vector
  2. Or already holds a reference to blocks_[block_id]

Thread B (add_points, Phase 1) — holds translator_mutex_ exclusive, at line 745 of dynamic_index.h:

  1. data_.resize(new_size) → calls add_block()blocks_.emplace_back(...)
  2. If blocks_ capacity is exceeded, std::vector reallocates: old buffer is freed

Thread A (continues):
3. Indexes into the freed old blocks_ buffer → reads garbage DenseArray → dereferences garbage pointer_SIGSEGV at 0xada00

Why translator_mutex_ Doesn't Help

translator_mutex_ is only acquired by:

  • add_points Phase 1 (exclusive) — to protect the ID translator maps
  • search's translate_to_external() (shared) — AFTER the search completes

Search's data_ access during graph traversal/distance computation holds NO lock at all. The translator_mutex_ was never designed to protect blocks_ from concurrent read.

The Stack Trace Confirms This

At crash time:

  • Thread 3048847 (update): in IDTranslator::insert_translation — this is AFTER data_.resize() already ran (line 745 precedes line 760 in Phase 1). The blocks_ reallocation has already happened.
  • Thread 3048835 (search): in IPImpl::compute — accessing vector data via a stale pointer into the old blocks_ buffer.

Why threshold=1 Crashes but threshold=1024 Doesn't

With threshold=1: add_points is called for every single insert (1M calls during upload + continuous during mixed workload). blocks_ reallocates whenever a block fills up (~every 1024 vectors), but the back-to-back add_points calls mean:

  • The system spends almost all its time cycling through add_points
  • Searches overlap with add_points almost continuously
  • The probability of a search being mid-get_datum() at the exact moment of blocks_ reallocation is high

With threshold=1024: add_points is called only every 1024 vectors (~977 times for 1M vectors). Between calls, there's a long stable window where searches run against static blocks_. The temporal overlap between search and reallocation is much smaller.

The Fix

SVS's internal locking is insufficient for concurrent search + mutation. The comment in VectorSimilarity ("SVS handles its own internal locking for concurrent search + modification") is incorrect for this code path.

Options:

  1. Re-introduce mainIndexGuard — shared during search, exclusive during addVectors. This was the previous working approach.
  2. Add a data-access reader-writer lock inside SVS — search holds shared, data_.resize() in Phase 1 holds exclusive. This would protect only the resize window, not all of Phase 1.
  3. Make blocks_ reallocation-safe — e.g., use a fixed-capacity container, or an indirection layer (pointer-to-blocks) that's atomically swapped (RCU-style).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants