Concurrent graph modifications#321
Conversation
| using index_type = Idx; | ||
| using value_type = std::span<Idx>; | ||
| using const_value_type = std::span<const Idx>; | ||
| using const_value_type = AtomicSpan<const Idx>; |
There was a problem hiding this comment.
For better flexibility and allow user to select syncronized/non-syncronized index kind, I would define a dedicated SyncronizedGraphBase class.
There was a problem hiding this comment.
I am not sure this duplication is really required. Performance penalty for synchronized vs non-synchronized search is only few precents.
There was a problem hiding this comment.
Hi @razdoburdin, is duplication too bad in this case? Otherwise, @rfsaliev may have a point on flexibility and it's always better if performance is not affected. But agree with you that pros and cons should be discussed if duplication is a large overhead.
There was a problem hiding this comment.
I plan to investigate the trade-offs, after the finalization of design of concurrent path. Let's make sure it works well first.
There was a problem hiding this comment.
To get valuable results, I would recommend to benchmark 'static' VamanaIndex with and without synchronized graph on different platforms - especially on multi-socket systems.
| if (is_deleted(dst)) { | ||
| const auto& others = graph_.get_node(dst); | ||
| all_candidates.insert(others.begin(), others.end()); | ||
| // SeqLock retry: a concurrent consolidate may be writing dst's |
There was a problem hiding this comment.
GraphConsolidator class is parametrized by a graph type, so we can keep both syncronized/non-synchronized behavior by detecting graph type.
|
While testing this PR in Redis, I initially removed all locking; that made a test invoking
Tests do pass when search locks are removed. |
|
@razdoburdin, I encountered another crash in Redis. What do you think about the following analysis from Claude? Root CauseRace condition between The crash path (in
The
FixThe minimal fix is making Now let me also check if the The entry_point remap at line 1025 is inside the exclusive lock so it's safe from new concurrent deletes at that point. However, the same race could affect it if the entry_point was deleted between consolidate and nonmissing_indices. SummaryThe crash: Root cause: A race condition between
Fix: Changed the Proposed change: diff --git a/include/svs/index/vamana/dynamic_index.h b/include/svs/index/vamana/dynamic_index.h
index 5cfcc7d..7e185ef 100644
--- a/include/svs/index/vamana/dynamic_index.h
+++ b/include/svs/index/vamana/dynamic_index.h
@@ -966,17 +966,17 @@ class MutableVamanaIndex {
auto old_id = new_to_old_id_map[new_id];
const auto& list = graph_.get_node(old_id);
- buffer.resize(list.size());
-
- // Transform the adjacency list from old to new.
- std::transform(
- list.begin(),
- list.end(),
- buffer.begin(),
- [&old_to_new_id_map](Idx old_id) {
- return old_to_new_id_map.at(old_id);
+ buffer.clear();
+ buffer.reserve(list.size());
+
+ // Remap adjacency list, dropping neighbors that were
+ // concurrently deleted after consolidate ran.
+ for (auto neighbor : list) {
+ auto it = old_to_new_id_map.find(neighbor);
+ if (it != old_to_new_id_map.end()) {
+ buffer.push_back(it->second);
}
- );
+ }
temp_graph.replace_node(batch_id, buffer);
} |
The problem was deeper. I have made some changes, search isn't affected, but calls of compact() is now serialized vs additions, deletions and consolidations. |
|
@razdoburdin, another crash with VecSim update threshold 1 (not with the default of 1024). Here is what I got from Claude: Root Cause: Race condition in SVS
|
This PR introduces concurrent graph modifications with seqlock pattern.