Skip to content

Docs improvements#1992

Open
aamijar wants to merge 10 commits into
rapidsai:mainfrom
aamijar:docs-improvements
Open

Docs improvements#1992
aamijar wants to merge 10 commits into
rapidsai:mainfrom
aamijar:docs-improvements

Conversation

@aamijar
Copy link
Copy Markdown
Member

@aamijar aamijar commented Apr 6, 2026

Continued overhaul of cuvs docs:

  1. Separate Multi-GPU Nearest Neighbors docs into subpages.
  2. Add docs for Multi-GPU All-Neighbors
  3. Add Spectral Clustering and Spectral Embedding Python API docs (refer to cuml)
  4. Add NN-Descent to C API docs
  5. Change Bruteforce title to Brute Force KNN
  6. Fix overload grouping on right sidebar
  7. Fix doc treedepth
  8. Separate Quantizer docs into subpages for each type
  9. Various spelling, capitalization, and alphabetical ordering fixes.
  10. Remove non-ascii characters (such as smart apostrophe)
  11. Fix broken links
  12. Move PCA C and Python docs as subpages.

How to build docs locally

cd /home/coder/cuvs/cpp/doxygen && doxygen Doxyfile
cd /home/coder/cuvs/docs && make html
cd /home/coder/cuvs/docs/build/html && python -m http.server 8080

@aamijar aamijar requested review from a team as code owners April 6, 2026 17:04
@aamijar aamijar self-assigned this Apr 6, 2026
@aamijar aamijar added non-breaking Introduces a non-breaking change doc Improvements or additions to documentation labels Apr 6, 2026
@aamijar aamijar moved this to In Progress in Unstructured Data Processing Apr 6, 2026
Comment thread docs/source/cpp_api.rst Outdated
cpp_api/cluster.rst
cpp_api/distance.rst
cpp_api/neighbors.rst
cpp_api/neighbors_mg.rst
Copy link
Copy Markdown
Member

@cjnolet cjnolet Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see- you put these out in the top-level for the C++ itself. Hmm. That's not awful. I think it's easier for users to find it this way. I think this is okay. When you said top-leve, I thought you measnt the top level index of the docs themselves.

I still wonder though- would this be better placed inside of the neighbors index? It seems like it'd make more sense there, rather than having it parallel to the other namespaces.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking it would be easier to find for users to have it moved up. But we can keep it in neighbors if you prefer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in bd3b3b1

Comment thread docs/source/python_api.rst Outdated
python_api/cluster.rst
python_api/distance.rst
python_api/neighbors.rst
python_api/neighbors_multi_gpu.rst
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer if this was nested inside "neighbors".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in bd3b3b1

@@ -1,5 +1,6 @@
document.addEventListener("DOMContentLoaded", () => {
const toc = document.querySelector(".bd-toc-nav");
const toc = document.querySelector("#pst-page-toc-nav") ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly describe what this change does?

Copy link
Copy Markdown
Member Author

@aamijar aamijar Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to fix point 6. Fix overload grouping on right sidebar.
For example instead of showing build a bunch of times, it shows build once. This used to work before without the change, but maybe there was some dependency update that made it stop working.
Ref: #1377

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • Documentation
    • Reorganized multi-GPU algorithm documentation into dedicated algorithm-specific pages for improved clarity and accessibility.
    • Added comprehensive documentation for NN-Descent, Spectral Clustering, and expanded multi-GPU support pages across C++ and C APIs.
    • Expanded preprocessing documentation with separate pages for PCA and quantization methods (binary, product, scalar).
    • Improved documentation consistency with standardized capitalization and heading formats.
    • Enhanced navigation depth in table of contents for better discoverability.

Walkthrough

This PR restructures multi-GPU and preprocessing documentation by renaming algorithm-specific Doxygen groups in C++ headers and converting documentation from inline Doxygen sections to modular Sphinx toctree navigation. It includes new algorithm-specific documentation pages for C, C++, and Python APIs, updates site configuration for improved navigation depth, and normalizes typography and capitalization across documentation.

Changes

Multi-GPU Documentation and API Reference Restructuring

Layer / File(s) Summary
C++ Header Doxygen Group Renames
cpp/include/cuvs/neighbors/cagra.hpp, ivf_flat.hpp, ivf_pq.hpp
Multi-GPU Doxygen group identifiers renamed from generic mg_cpp_index_* / mg_cpp_serialize / mg_cpp_deserialize / mg_cpp_distribute to algorithm-specific names (mg_cpp_cagra_*, mg_cpp_ivf_flat_*, mg_cpp_ivf_pq_*) across build, extend, search, serialize, deserialize, and distribute sections.
C API Multi-GPU Documentation Restructure
docs/source/c_api/neighbors_mg.rst, neighbors_mg_all_neighbors_c.rst, neighbors_mg_cagra_c.rst, neighbors_mg_ivf_flat_c.rst, neighbors_mg_ivf_pq_c.rst
Multi-GPU C API documentation reorganized from inline doxygengroup blocks to Sphinx toctree with separate dedicated pages for all-neighbors, CAGRA, IVF-Flat, and IVF-PQ algorithms. Each new page documents algorithm-specific types, parameters, and operations.
C++ API Multi-GPU Documentation Restructure
docs/source/cpp_api/neighbors_mg.rst, neighbors_mg_all_neighbors.rst, neighbors_mg_cagra.rst, neighbors_mg_ivf_flat.rst, neighbors_mg_ivf_pq.rst
Multi-GPU C++ API documentation converted to modular structure with dedicated algorithm pages including parameter documentation, build/extend/search/serialize/deserialize/distribute API sections, and usage examples.
Python API Multi-GPU Documentation Update
docs/source/python_api/neighbors_mg_all_neighbors.rst, neighbors_multi_gpu.rst
Add Python API documentation for multi-GPU all-neighbors with configuration steps and example code; update toctree reference.
NN-Descent C API Documentation
docs/source/c_api/neighbors_nn_descent_c.rst, neighbors.rst
New C API documentation page for NN-Descent algorithm with index parameters, index interface, and build sections.

Preprocessing Documentation Refactoring

Layer / File(s) Summary
C API Preprocessing Restructure
docs/source/c_api/preprocessing.rst, preprocessing_pca.rst, preprocessing_quantize.rst, preprocessing_quantize_binary.rst, preprocessing_quantize_pq.rst, preprocessing_quantize_scalar.rst
C API preprocessing documentation converted from inline sections to modular toctree with separate pages for PCA and quantization groups (binary, PQ, scalar).
C++ API Preprocessing Restructure
docs/source/cpp_api/preprocessing_quantize.rst, preprocessing_quantize_binary.rst, preprocessing_quantize_pq.rst, preprocessing_quantize_scalar.rst
C++ API preprocessing refactored to use toctree with dedicated quantizer documentation pages.
Python API Preprocessing Restructure
docs/source/python_api/preprocessing.rst, preprocessing_pca.rst, preprocessing_quantize.rst, preprocessing_quantize_binary.rst, preprocessing_quantize_pq.rst, preprocessing_quantize_scalar.rst, preprocessing_spectral_embedding.rst
Python API preprocessing documentation reorganized to modular structure with separate pages for PCA, quantization submodules, and spectral embedding.

Documentation Site Configuration and Formatting Updates

Layer / File(s) Summary
Site Configuration and Navigation Updates
docs/source/conf.py, api_docs.rst, c_api.rst, _static/collapse_overloads.js
Update Sphinx navigation_depth to 5, increase toctree maxdepth from 3 to 5, reorder toctree entries (cluster before distance), update TOC element selector fallback in JavaScript.
API Documentation Title and Heading Updates
docs/source/c_api/cluster.rst, neighbors_bruteforce_c.rst, cpp_api/neighbors_bruteforce.rst
Standardize documentation page titles: "Clustering" → "Cluster", "Bruteforce" → "Brute Force KNN" across C and C++ API pages.
Neighbors Documentation Page Ordering and Titles
docs/source/neighbors/neighbors.rst, neighbors/bruteforce.rst, neighbors/vamana.rst
Rename main page from "Nearest Neighbor" to "Nearest Neighbors", reorder toctree to list all-neighbors first, update individual algorithm titles, and correct spelling/wording in Vamana description.
Typography, Capitalization, and Wording Improvements
docs/source/choosing_and_configuring_indexes.rst, comparing_indexes.rst, filtering.rst, getting_started.rst, index.rst, tuning_guide.rst, vector_databases_vs_vector_search.rst
Normalize smart quotes to ASCII, capitalize section headings consistently, improve phrasing in conceptual documentation, standardize resource link capitalization, and expand explanatory content on vector database architectures.
Spectral Clustering Documentation
docs/source/python_api/cluster_spectral.rst, docs/source/python_api/cluster.rst
Add new documentation pages for spectral clustering in Python API, referencing cuML implementations and linking to C++ API documentation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

improvement

Suggested reviewers

  • robertmaynard
  • KyleFromNVIDIA
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title "Docs improvements" is vague and overly broad, lacking specificity about the primary documentation changes made in this substantial PR. Consider a more specific title that highlights the main change, such as "Restructure documentation: split multi-GPU and quantizer docs into subpages" or "Documentation refactoring: improve organization and fix presentation issues".
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description provides a clear enumerated list of 12 documentation improvements and includes local build instructions, directly relating to the changeset's documentation-focused modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (12)
docs/source/getting_started.rst (1)

7-17: ⚡ Quick win

Align nav link text with destination page titles for consistency.

Minor consistency nit: using exact page titles in this top list helps readers scan and cross-reference sections faster.

As per coding guidelines, "Consistency: Version numbers, parameter types, and terminology match code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/getting_started.rst` around lines 7 - 17, Replace the loose
backtick link text "`Supported indexes`_" with the exact destination page title
to maintain consistency; specifically change the line containing "`Supported
indexes`_" so it uses the same title as the neighbors doc (i.e. match the target
title used in ":doc:`Vector Search Index Guide <neighbors/neighbors>`"),
ensuring the nav entry reads exactly the page title and follows the same :doc:
link pattern as the other bullets.
docs/source/choosing_and_configuring_indexes.rst (1)

28-28: ⚡ Quick win

Tighten recall definition to avoid ambiguity.

The current sentence can confuse how recall is computed. A more explicit phrasing would make the metric definition clearer.

✍️ Suggested rewrite
-What do we mean when we say quality of an index? In machine learning terminology, we measure this using recall, which is sometimes used interchangeably to mean accuracy, even though the two are slightly different measures. Recall, when used in vector search, essentially means "out of all of my results, which results would have been included in the exact results?" In vector search, the objective is to find some number of vectors that are closest to a given query vector so recall tends to be more relaxed than accuracy, discriminating only on set inclusion, rather than on exact ordered list matching, which would be closer to an accuracy measure.
+What do we mean when we say quality of an index? In machine learning terminology, we often measure this using recall, which is related to (but not identical to) accuracy. In vector search, recall asks: "out of the true nearest neighbors, how many did the index return?" This focuses on set inclusion rather than exact ordering, whereas stricter accuracy-style measures may also evaluate rank order.

As per coding guidelines, "Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/choosing_and_configuring_indexes.rst` at line 28, The paragraph
starting "What do we mean when we say quality of an index?" uses an ambiguous
description of recall; replace it with a precise definition stating that recall
= (number of relevant items retrieved) / (total number of relevant items),
explain that in vector search "relevant" means items that appear in the exact
nearest-neighbor set for the query, and explicitly contrast recall (set
inclusion, not ordering) with accuracy/precision to avoid confusion; update the
sentence(s) that reference "recall" and "vector search" accordingly so the
metric computation and its interpretation are unambiguous.
docs/source/comparing_indexes.rst (1)

56-56: ⚡ Quick win

Avoid absolute claim about vector distribution uniformity.

“Completely uniform” is too strong and can be inaccurate across real deployments. Suggest softening this statement to avoid misleading guidance.

✍️ Suggested rewrite
-It turns out that most vector databases, like Milvus for example, make many smaller vector search indexing models for a single "index", and the distribution of the vectors across the smaller index models are assumed to be completely uniform.
+Many vector databases (for example, Milvus) build multiple smaller indexing models for a single "index", and often target an approximately uniform distribution of vectors across those models.

As per coding guidelines, "Accuracy: Verify code examples compile and run correctly" and "Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/comparing_indexes.rst` at line 56, The sentence claiming that
smaller index shards are "assumed to be completely uniform" is too absolute;
update the wording in the comparing_indexes.rst paragraph (the sentence
containing 'make many smaller vector search indexing models for a single
"index"' and the phrase "completely uniform") to a softened phrasing such as
"often assumed to be roughly or approximately uniform" and add a brief caveat
that real deployments can exhibit non-uniform shard distributions and that
practitioners should validate assumptions empirically (e.g., via sampling or
cluster diagnostics).
docs/source/tuning_guide.rst (1)

8-8: ⚡ Quick win

Clarify the introduction sentence and metric definition.

This line is hard to parse (run-on + sentence fragment), which makes the core objective unclear. A short rewrite would improve readability without changing intent.

✍️ Suggested rewrite
-A Method for tuning and evaluating Vector Search Indexes At Scale in Locally Indexed Vector Databases. For more information on the differences between locally and globally indexed vector databases, please see :doc:`this guide <vector_databases_vs_vector_search>`. The goal of this guide is to give users a scalable and effective approach for tuning a vector search index, no matter how large.  Evaluation of a vector search index "model" that measures recall in proportion to build time so that it penalizes the recall when the build time is really high (should ultimately optimize for finding a lower build time and higher recall).
+This guide presents a method for tuning and evaluating vector search indexes at scale in locally indexed vector databases. For more information on the differences between locally and globally indexed vector databases, see :doc:`this guide <vector_databases_vs_vector_search>`. The goal is to provide a scalable approach that optimizes for high recall and low build time by penalizing configurations with excessive build cost.

As per coding guidelines, "Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/tuning_guide.rst` at line 8, The opening sentence "A Method for
tuning and evaluating Vector Search Indexes At Scale in Locally Indexed Vector
Databases..." is a run‑on and unclear; rewrite it into two concise sentences:
one stating the guide's purpose (scalable method for tuning and evaluating
locally indexed vector search indexes) and a second that defines the evaluation
metric succinctly (measure recall while penalizing high build time, e.g.,
optimize for higher recall per unit build time or recall weighted by inverse
build time). Replace the original long paragraph with this clearer two‑sentence
version and ensure the terms "locally indexed vector databases" and "recall
penalized by build time" remain present for clarity.
docs/source/vector_databases_vs_vector_search.rst (1)

42-42: ⚡ Quick win

Improve sentence clarity and fix awkward parenthetical.

This sentence is extremely long (~130 words) and contains an awkward parenthetical phrasing "(sent to, and)" that disrupts readability. The complex structure with multiple nested concepts makes it difficult to follow the explanation of globally partitioned vector search indexes.

Consider breaking this into 2-3 shorter sentences and rephrasing the awkward "(sent to, and)" construction for better clarity. As per coding guidelines, documentation should prioritize clarity to ensure readers can understand technical concepts without confusion.

✏️ Suggested restructuring for clarity
-Some special-purpose vector databases follow this design, such as Yahoo's Vespa and Google's Spanner. A global index is trained to partition the entire database's vectors up front as soon as there are enough vectors to do so (usually these databases are at a large enough scale that a significant number of vectors are bootstrapped initially and so it avoids the cold start problem). Ingested vectors are first run through the global index (clustering, for example, but tree- and graph-based methods have also been used) to determine which partition they belong to and the vectors are then (sent to, and) written  directly to that partition. The individual partitions can contain a graph, tree, or a simple IVF list. These types of indexes have been able to scale to hundreds of billions to trillions of vectors, and since the partitions are themselves often implicitly based on neighborhoods, rather than being based on uniformly random distributed vectors like the locally partitioned architectures, the partitions can be grouped together or intentionally separated to support localized searches or load balancing, depending upon the needs of the system.
+Some special-purpose vector databases follow this design, such as Yahoo's Vespa and Google's Spanner. A global index is trained to partition the entire database's vectors up front as soon as there are enough vectors to do so (usually these databases are at a large enough scale that a significant number of vectors are bootstrapped initially and so it avoids the cold start problem). Ingested vectors are first run through the global index (clustering, for example, but tree- and graph-based methods have also been used) to determine which partition they belong to, and the vectors are then routed to and written directly to that partition. 
+
+The individual partitions can contain a graph, tree, or a simple IVF list. These types of indexes have been able to scale to hundreds of billions to trillions of vectors, and since the partitions are themselves often implicitly based on neighborhoods, rather than being based on uniformly random distributed vectors like the locally partitioned architectures, the partitions can be grouped together or intentionally separated to support localized searches or load balancing, depending upon the needs of the system.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/vector_databases_vs_vector_search.rst` at line 42, The long
sentence starting with "Some special-purpose vector databases..." is hard to
read and contains an awkward parenthetical "(sent to, and)"; split this into 2–3
shorter sentences by ending after "written directly to that partition", rephrase
the parenthetical to something like "and then written directly to that
partition" or remove it entirely, and break the remainder into a separate
sentence describing partition contents (graph/tree/IVF) and another describing
scale and partition grouping behavior so each idea is clear and concise.
docs/source/cpp_api/neighbors_bruteforce.rst (1)

1-4: ⚡ Quick win

Inconsistent terminology between title and description.

The title uses "Brute Force KNN" (line 1), but line 4 uses "bruteforce method" (single word, lowercase). For consistency, consider updating line 4 to match the title's styling, e.g., "The brute force method...".

As per coding guidelines, documentation changes should maintain consistency in terminology.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/neighbors_bruteforce.rst` around lines 1 - 4, Update the
wording in the description to use the same terminology and casing as the title
"Brute Force KNN": replace "The bruteforce method" with "The brute force method"
so the phrase matches the title's two-word, lower/upper-case styling; ensure any
other occurrences in this document use "brute force" consistently.
docs/source/neighbors/bruteforce.rst (1)

1-63: ⚡ Quick win

Inconsistent terminology throughout the document.

The title uses "Brute Force KNN" (line 1), but the body text inconsistently uses "Brute-force" (line 4), "brute-force" (lines 11, 20, 21, 24, 25, 45), and "pre-filtered brute-force" (line 24). For consistency, when referring to the algorithm or method by name, the terminology should align with the title styling. Consider using "brute force" (lowercase, no hyphen) when used as a modifier/adjective in context, or standardize on one form throughout.

As per coding guidelines, documentation changes should maintain consistency in terminology.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/bruteforce.rst` around lines 1 - 63, The document uses
inconsistent terminology: title "Brute Force KNN" versus "Brute-force",
"brute-force", and "pre-filtered brute-force" in the body; standardize to "brute
force" (lowercase, no hyphen) throughout the body while keeping the title "Brute
Force KNN" as-is. Update the occurrences of "Brute-force" (start paragraph), all
"brute-force" usages (lines referencing exhaustive index, filtering,
pre-filtered text and tuning considerations), and "pre-filtered brute-force" to
"brute force" or "pre-filtered brute force" respectively so the naming is
consistent across the file.
docs/source/c_api/neighbors_bruteforce_c.rst (1)

1-4: ⚡ Quick win

Inconsistent terminology between title and description.

The title uses "Brute Force KNN" (line 1), but line 4 uses "bruteforce method" (single word, lowercase). For consistency, consider updating line 4 to match the title's styling, e.g., "The brute force method...".

As per coding guidelines, documentation changes should maintain consistency in terminology.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api/neighbors_bruteforce_c.rst` around lines 1 - 4, The
document uses inconsistent terminology: the title "Brute Force KNN" vs the
phrase "bruteforce method" in the paragraph; update the sentence containing
"bruteforce method" to match the title's styling and casing (e.g., replace "The
bruteforce method is running the KNN algorithm." with "The brute force method is
running the KNN algorithm.") so the phrase "brute force" matches the header
"Brute Force KNN" and follows documentation style guidelines.
docs/source/cpp_api/preprocessing_quantize_binary.rst (1)

4-6: 💤 Low value

Consider renaming the role to avoid confusion.

The role is named py but configures C++ syntax highlighting. This naming could confuse documentation maintainers and readers who might expect Python code. Consider renaming to cpp or cplusplus for clarity.

However, if this pattern is already established across the documentation, this comment can be deferred to maintain consistency with existing conventions.

💡 Suggested improvement
-.. role:: py(code)
+.. role:: cpp(code)
    :language: c++
    :class: highlight
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/preprocessing_quantize_binary.rst` around lines 4 - 6,
The directive defines a Sphinx role named "py" that sets :language: c++, which
is misleading; rename the role identifier from "py" to "cpp" (or "cplusplus") in
the role declaration (the line containing "role:: py(code)") and update any
references to that role across the docs so the role name matches the configured
:language: c++; if the "py" naming is an established global convention, instead
document that convention in a comment near the role declaration to prevent
confusion.
docs/source/cpp_api/neighbors_mg_cagra.rst (1)

6-8: ⚡ Quick win

Clarify the role name to match the language.

The role is named :py(code) but configured for C++ syntax highlighting. This naming is misleading—readers might expect Python code. Consider renaming to :cpp(code) or using a more descriptive name like :cxx(code) for clarity.

📝 Suggested role rename
-.. role:: py(code)
+.. role:: cpp(code)
    :language: c++
    :class: highlight

Then update any usage of :py: to :cpp: elsewhere in the documentation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/neighbors_mg_cagra.rst` around lines 6 - 8, The Sphinx
role is currently declared as ":py(code)" but set to use C++ highlighting, which
is misleading; change the role name to ":cpp(code)" (or ":cxx(code)") in this
directive and then update all usages of the ":py:" role for C++ snippets to the
new role throughout the docs so the role name matches the language and retains
the ":language: c++" and ":class: highlight" settings.
docs/source/c_api/preprocessing_quantize.rst (1)

4-6: ⚡ Quick win

Clarify the role name to match the language.

The role is named :py(code) but configured for C syntax highlighting. This naming is misleading—readers might expect Python code. Consider renaming to :c(code) for clarity and consistency.

📝 Suggested role rename
-.. role:: py(code)
+.. role:: c(code)
    :language: c
    :class: highlight

Then update any usage of :py: to :c: elsewhere in the documentation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api/preprocessing_quantize.rst` around lines 4 - 6, The role
declaration currently uses the misleading symbol ":py(code)" while specifying C
highlighting; change the role name to ":c(code)" in the directive and then
update all occurrences of the ":py:" role usage in this document (and any
related docs) to ":c:" so the role name matches the configured C language
highlighter.
docs/source/c_api/preprocessing_quantize_pq.rst (1)

4-6: ⚡ Quick win

Clarify the role name to match the language.

The role is named :py(code) but configured for C syntax highlighting. This naming is misleading—readers might expect Python code. Consider renaming to :c(code) for clarity and consistency.

📝 Suggested role rename
-.. role:: py(code)
+.. role:: c(code)
    :language: c
    :class: highlight

Then update any usage of :py: to :c: elsewhere in the documentation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api/preprocessing_quantize_pq.rst` around lines 4 - 6, The role
declaration currently defines a Python role (:py(code)) but sets :language: c
which is confusing; change the role name from :py(code) to :c(code) in the
directive and then update all documentation references that use the :py: role to
use the :c: role instead so syntax highlighting and naming are consistent
(search for occurrences of ":py(" or ":py:" and replace with ":c(" or ":c:").
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/cpp_api/neighbors_mg_all_neighbors.rst`:
- Line 24: The example call uses an unqualified all_neighbors::build which can
fail to compile; update the snippet to use the fully-qualified namespace (e.g.,
lance::neighbors_mg::all_neighbors::build(handle, params, dataset, indices,
distances)) so the example is self-contained and will compile as shown; locate
the example in the neighbors_mg_all_neighbors.rst snippet and replace
all_neighbors::build(...) with the fully-qualified
lance::neighbors_mg::all_neighbors::build(...) (or the project's top-level
namespace used elsewhere) so it matches other examples.

In `@docs/source/cpp_api/neighbors_mg_cagra.rst`:
- Around line 17-76: The RST uses incorrect Doxygen group names (e.g.,
mg_cpp_index_params, mg_cpp_search_params, mg_cpp_cagra_index_build,
mg_cpp_cagra_deserialize, mg_cpp_cagra_distribute); update these to the actual
groups defined in the header (e.g., cagra_cpp_index_params,
cagra_cpp_search_params, cagra_cpp_index_build, etc.) and remove any references
to non-existent groups (such as mg_cpp_cagra_deserialize and
mg_cpp_cagra_distribute) or replace them with the corresponding cagra_cpp_*
group names from cpp/include/cuvs/neighbors/cagra.hpp so the doxygengroup
directives match the header definitions.

In `@docs/source/neighbors/vamana.rst`:
- Line 4: The sentence in the Vamana doc (mentions "Vamana", "cuVS" and
"DiskANN") has two typos: replace "accelreate" with "accelerate" and "idnexes"
with "indexes" so the line reads that cuVS provides a GPU-optimized Vamana to
"accelerate graph construction to build DiskANN indexes"; update those two words
accordingly in the Vamana paragraph.

---

Nitpick comments:
In `@docs/source/c_api/neighbors_bruteforce_c.rst`:
- Around line 1-4: The document uses inconsistent terminology: the title "Brute
Force KNN" vs the phrase "bruteforce method" in the paragraph; update the
sentence containing "bruteforce method" to match the title's styling and casing
(e.g., replace "The bruteforce method is running the KNN algorithm." with "The
brute force method is running the KNN algorithm.") so the phrase "brute force"
matches the header "Brute Force KNN" and follows documentation style guidelines.

In `@docs/source/c_api/preprocessing_quantize_pq.rst`:
- Around line 4-6: The role declaration currently defines a Python role
(:py(code)) but sets :language: c which is confusing; change the role name from
:py(code) to :c(code) in the directive and then update all documentation
references that use the :py: role to use the :c: role instead so syntax
highlighting and naming are consistent (search for occurrences of ":py(" or
":py:" and replace with ":c(" or ":c:").

In `@docs/source/c_api/preprocessing_quantize.rst`:
- Around line 4-6: The role declaration currently uses the misleading symbol
":py(code)" while specifying C highlighting; change the role name to ":c(code)"
in the directive and then update all occurrences of the ":py:" role usage in
this document (and any related docs) to ":c:" so the role name matches the
configured C language highlighter.

In `@docs/source/choosing_and_configuring_indexes.rst`:
- Line 28: The paragraph starting "What do we mean when we say quality of an
index?" uses an ambiguous description of recall; replace it with a precise
definition stating that recall = (number of relevant items retrieved) / (total
number of relevant items), explain that in vector search "relevant" means items
that appear in the exact nearest-neighbor set for the query, and explicitly
contrast recall (set inclusion, not ordering) with accuracy/precision to avoid
confusion; update the sentence(s) that reference "recall" and "vector search"
accordingly so the metric computation and its interpretation are unambiguous.

In `@docs/source/comparing_indexes.rst`:
- Line 56: The sentence claiming that smaller index shards are "assumed to be
completely uniform" is too absolute; update the wording in the
comparing_indexes.rst paragraph (the sentence containing 'make many smaller
vector search indexing models for a single "index"' and the phrase "completely
uniform") to a softened phrasing such as "often assumed to be roughly or
approximately uniform" and add a brief caveat that real deployments can exhibit
non-uniform shard distributions and that practitioners should validate
assumptions empirically (e.g., via sampling or cluster diagnostics).

In `@docs/source/cpp_api/neighbors_bruteforce.rst`:
- Around line 1-4: Update the wording in the description to use the same
terminology and casing as the title "Brute Force KNN": replace "The bruteforce
method" with "The brute force method" so the phrase matches the title's
two-word, lower/upper-case styling; ensure any other occurrences in this
document use "brute force" consistently.

In `@docs/source/cpp_api/neighbors_mg_cagra.rst`:
- Around line 6-8: The Sphinx role is currently declared as ":py(code)" but set
to use C++ highlighting, which is misleading; change the role name to
":cpp(code)" (or ":cxx(code)") in this directive and then update all usages of
the ":py:" role for C++ snippets to the new role throughout the docs so the role
name matches the language and retains the ":language: c++" and ":class:
highlight" settings.

In `@docs/source/cpp_api/preprocessing_quantize_binary.rst`:
- Around line 4-6: The directive defines a Sphinx role named "py" that sets
:language: c++, which is misleading; rename the role identifier from "py" to
"cpp" (or "cplusplus") in the role declaration (the line containing "role::
py(code)") and update any references to that role across the docs so the role
name matches the configured :language: c++; if the "py" naming is an established
global convention, instead document that convention in a comment near the role
declaration to prevent confusion.

In `@docs/source/getting_started.rst`:
- Around line 7-17: Replace the loose backtick link text "`Supported indexes`_"
with the exact destination page title to maintain consistency; specifically
change the line containing "`Supported indexes`_" so it uses the same title as
the neighbors doc (i.e. match the target title used in ":doc:`Vector Search
Index Guide <neighbors/neighbors>`"), ensuring the nav entry reads exactly the
page title and follows the same :doc: link pattern as the other bullets.

In `@docs/source/neighbors/bruteforce.rst`:
- Around line 1-63: The document uses inconsistent terminology: title "Brute
Force KNN" versus "Brute-force", "brute-force", and "pre-filtered brute-force"
in the body; standardize to "brute force" (lowercase, no hyphen) throughout the
body while keeping the title "Brute Force KNN" as-is. Update the occurrences of
"Brute-force" (start paragraph), all "brute-force" usages (lines referencing
exhaustive index, filtering, pre-filtered text and tuning considerations), and
"pre-filtered brute-force" to "brute force" or "pre-filtered brute force"
respectively so the naming is consistent across the file.

In `@docs/source/tuning_guide.rst`:
- Line 8: The opening sentence "A Method for tuning and evaluating Vector Search
Indexes At Scale in Locally Indexed Vector Databases..." is a run‑on and
unclear; rewrite it into two concise sentences: one stating the guide's purpose
(scalable method for tuning and evaluating locally indexed vector search
indexes) and a second that defines the evaluation metric succinctly (measure
recall while penalizing high build time, e.g., optimize for higher recall per
unit build time or recall weighted by inverse build time). Replace the original
long paragraph with this clearer two‑sentence version and ensure the terms
"locally indexed vector databases" and "recall penalized by build time" remain
present for clarity.

In `@docs/source/vector_databases_vs_vector_search.rst`:
- Line 42: The long sentence starting with "Some special-purpose vector
databases..." is hard to read and contains an awkward parenthetical "(sent to,
and)"; split this into 2–3 shorter sentences by ending after "written directly
to that partition", rephrase the parenthetical to something like "and then
written directly to that partition" or remove it entirely, and break the
remainder into a separate sentence describing partition contents
(graph/tree/IVF) and another describing scale and partition grouping behavior so
each idea is clear and concise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b961e923-c568-4526-8c84-020c4ec8641d

📥 Commits

Reviewing files that changed from the base of the PR and between 90c18a8 and 9c72306.

📒 Files selected for processing (54)
  • cpp/include/cuvs/neighbors/cagra.hpp
  • cpp/include/cuvs/neighbors/ivf_flat.hpp
  • cpp/include/cuvs/neighbors/ivf_pq.hpp
  • docs/source/_static/collapse_overloads.js
  • docs/source/api_docs.rst
  • docs/source/c_api.rst
  • docs/source/c_api/cluster.rst
  • docs/source/c_api/neighbors.rst
  • docs/source/c_api/neighbors_bruteforce_c.rst
  • docs/source/c_api/neighbors_mg.rst
  • docs/source/c_api/neighbors_mg_all_neighbors_c.rst
  • docs/source/c_api/neighbors_mg_cagra_c.rst
  • docs/source/c_api/neighbors_mg_ivf_flat_c.rst
  • docs/source/c_api/neighbors_mg_ivf_pq_c.rst
  • docs/source/c_api/neighbors_nn_descent_c.rst
  • docs/source/c_api/preprocessing.rst
  • docs/source/c_api/preprocessing_pca.rst
  • docs/source/c_api/preprocessing_quantize.rst
  • docs/source/c_api/preprocessing_quantize_binary.rst
  • docs/source/c_api/preprocessing_quantize_pq.rst
  • docs/source/c_api/preprocessing_quantize_scalar.rst
  • docs/source/choosing_and_configuring_indexes.rst
  • docs/source/comparing_indexes.rst
  • docs/source/conf.py
  • docs/source/cpp_api/neighbors_bruteforce.rst
  • docs/source/cpp_api/neighbors_mg.rst
  • docs/source/cpp_api/neighbors_mg_all_neighbors.rst
  • docs/source/cpp_api/neighbors_mg_cagra.rst
  • docs/source/cpp_api/neighbors_mg_ivf_flat.rst
  • docs/source/cpp_api/neighbors_mg_ivf_pq.rst
  • docs/source/cpp_api/preprocessing_quantize.rst
  • docs/source/cpp_api/preprocessing_quantize_binary.rst
  • docs/source/cpp_api/preprocessing_quantize_pq.rst
  • docs/source/cpp_api/preprocessing_quantize_scalar.rst
  • docs/source/filtering.rst
  • docs/source/getting_started.rst
  • docs/source/index.rst
  • docs/source/neighbors/all_neighbors.rst
  • docs/source/neighbors/bruteforce.rst
  • docs/source/neighbors/neighbors.rst
  • docs/source/neighbors/vamana.rst
  • docs/source/python_api/cluster.rst
  • docs/source/python_api/cluster_spectral.rst
  • docs/source/python_api/neighbors_mg_all_neighbors.rst
  • docs/source/python_api/neighbors_multi_gpu.rst
  • docs/source/python_api/preprocessing.rst
  • docs/source/python_api/preprocessing_pca.rst
  • docs/source/python_api/preprocessing_quantize.rst
  • docs/source/python_api/preprocessing_quantize_binary.rst
  • docs/source/python_api/preprocessing_quantize_pq.rst
  • docs/source/python_api/preprocessing_quantize_scalar.rst
  • docs/source/python_api/preprocessing_spectral_embedding.rst
  • docs/source/tuning_guide.rst
  • docs/source/vector_databases_vs_vector_search.rst

params.n_clusters = 8; // partition data into 8 clusters
params.overlap_factor = 2; // each point assigned to 2 clusters

all_neighbors::build(handle, params, dataset, indices, distances);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix namespace qualification in the C++ example call.

all_neighbors::build(...) is unqualified in this snippet, so the example is not self-contained and may fail to compile as documented.

Proposed fix
-   all_neighbors::build(handle, params, dataset, indices, distances);
+   cuvs::neighbors::all_neighbors::build(handle, params, dataset, indices, distances);

As per coding guidelines, documentation changes must ensure “Accuracy: Verify code examples compile and run correctly.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
all_neighbors::build(handle, params, dataset, indices, distances);
cuvs::neighbors::all_neighbors::build(handle, params, dataset, indices, distances);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/neighbors_mg_all_neighbors.rst` at line 24, The example
call uses an unqualified all_neighbors::build which can fail to compile; update
the snippet to use the fully-qualified namespace (e.g.,
lance::neighbors_mg::all_neighbors::build(handle, params, dataset, indices,
distances)) so the example is self-contained and will compile as shown; locate
the example in the neighbors_mg_all_neighbors.rst snippet and replace
all_neighbors::build(...) with the fully-qualified
lance::neighbors_mg::all_neighbors::build(...) (or the project's top-level
namespace used elsewhere) so it matches other examples.

Comment on lines +17 to +76
.. doxygengroup:: mg_cpp_index_params
:project: cuvs
:members:
:content-only:

Search Parameters
-----------------

.. doxygengroup:: mg_cpp_search_params
:project: cuvs
:members:
:content-only:

Index Build
-----------

.. doxygengroup:: mg_cpp_cagra_index_build
:project: cuvs
:members:
:content-only:

Index Extend
------------

.. doxygengroup:: mg_cpp_cagra_index_extend
:project: cuvs
:members:
:content-only:

Index Search
------------

.. doxygengroup:: mg_cpp_cagra_index_search
:project: cuvs
:members:
:content-only:

Index Serialize
---------------

.. doxygengroup:: mg_cpp_cagra_serialize
:project: cuvs
:members:
:content-only:

Index Deserialize
-----------------

.. doxygengroup:: mg_cpp_cagra_deserialize
:project: cuvs
:members:
:content-only:

Index Distribute
----------------

.. doxygengroup:: mg_cpp_cagra_distribute
:project: cuvs
:members:
:content-only:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify Doxygen group definitions for Multi-GPU CAGRA

# Expected groups to find in the C++ header
GROUPS=(
  "mg_cpp_index_params"
  "mg_cpp_search_params"
  "mg_cpp_cagra_index_build"
  "mg_cpp_cagra_index_extend"
  "mg_cpp_cagra_index_search"
  "mg_cpp_cagra_serialize"
  "mg_cpp_cagra_deserialize"
  "mg_cpp_cagra_distribute"
)

echo "Checking for Doxygen group definitions in CAGRA headers..."
for group in "${GROUPS[@]}"; do
  echo "Searching for group: $group"
  rg -n "@ingroup\s+$group|@defgroup\s+$group" cpp/include/cuvs/neighbors/cagra.hpp
done

Repository: rapidsai/cuvs

Length of output: 142


🏁 Script executed:

# First, check if the header file exists
find . -name "cagra.hpp" -type f

Repository: rapidsai/cuvs

Length of output: 124


🏁 Script executed:

# Search for these group definitions across the codebase
rg "@ingroup mg_cpp_|@defgroup mg_cpp_" cpp/include/

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

# Also check the RST file structure and referenced groups
cat -n docs/source/cpp_api/neighbors_mg_cagra.rst | head -80

Repository: rapidsai/cuvs

Length of output: 2000


🏁 Script executed:

# Read the C++ header file to see what Doxygen groups are defined
head -200 cpp/include/cuvs/neighbors/cagra.hpp

Repository: rapidsai/cuvs

Length of output: 7814


🏁 Script executed:

# Search for all `@ingroup` and `@defgroup` directives in the header file (without mg_cpp_ prefix)
rg "@ingroup|@defgroup" cpp/include/cuvs/neighbors/cagra.hpp -A 1

Repository: rapidsai/cuvs

Length of output: 683


Fix incorrect Doxygen group references in Multi-GPU CAGRA documentation.

The RST file references non-existent Doxygen groups. The header file defines groups with the prefix cagra_cpp_ (e.g., cagra_cpp_index_params, cagra_cpp_search_params, cagra_cpp_index_build), but the RST file references groups with the prefix mg_cpp_ (e.g., mg_cpp_index_params, mg_cpp_search_params, mg_cpp_cagra_index_build). Additionally, the groups mg_cpp_cagra_deserialize and mg_cpp_cagra_distribute are not defined in the header file and should not be referenced.

Correct the group names to match those defined in cpp/include/cuvs/neighbors/cagra.hpp, or create these groups in the header if they are intended for Multi-GPU-specific functions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/neighbors_mg_cagra.rst` around lines 17 - 76, The RST
uses incorrect Doxygen group names (e.g., mg_cpp_index_params,
mg_cpp_search_params, mg_cpp_cagra_index_build, mg_cpp_cagra_deserialize,
mg_cpp_cagra_distribute); update these to the actual groups defined in the
header (e.g., cagra_cpp_index_params, cagra_cpp_search_params,
cagra_cpp_index_build, etc.) and remove any references to non-existent groups
(such as mg_cpp_cagra_deserialize and mg_cpp_cagra_distribute) or replace them
with the corresponding cagra_cpp_* group names from
cpp/include/cuvs/neighbors/cagra.hpp so the doxygengroup directives match the
header definitions.

======

VAMANA is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the `published paper <https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf>`, and a highly optimized `open-source repository <https://github.com/microsoft/DiskANN>` includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelreate graph construction to build DiskANN idnexes. At a high level, the Vamana algorithm operates as follows:
Vamana is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the `published paper <https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf>`_, and a highly optimized `open-source repository <https://github.com/microsoft/DiskANN>`_ includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelreate graph construction to build DiskANN idnexes. At a high level, the Vamana algorithm operates as follows:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling errors.

Line 4 contains two spelling errors:

  • "accelreate" should be "accelerate"
  • "idnexes" should be "indexes"
📝 Proposed fix for spelling errors
-Vamana is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the `published paper <https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf>`_, and a highly optimized `open-source repository <https://github.com/microsoft/DiskANN>`_  includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelreate graph construction to build DiskANN idnexes. At a high level, the Vamana algorithm operates as follows:
+Vamana is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the `published paper <https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf>`_, and a highly optimized `open-source repository <https://github.com/microsoft/DiskANN>`_  includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelerate graph construction to build DiskANN indexes. At a high level, the Vamana algorithm operates as follows:

As per coding guidelines, documentation should be accurate and free of spelling errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Vamana is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the `published paper <https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf>`_, and a highly optimized `open-source repository <https://github.com/microsoft/DiskANN>`_ includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelreate graph construction to build DiskANN idnexes. At a high level, the Vamana algorithm operates as follows:
Vamana is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the `published paper <https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf>`_, and a highly optimized `open-source repository <https://github.com/microsoft/DiskANN>`_ includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelerate graph construction to build DiskANN indexes. At a high level, the Vamana algorithm operates as follows:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/vamana.rst` at line 4, The sentence in the Vamana doc
(mentions "Vamana", "cuVS" and "DiskANN") has two typos: replace "accelreate"
with "accelerate" and "idnexes" with "indexes" so the line reads that cuVS
provides a GPU-optimized Vamana to "accelerate graph construction to build
DiskANN indexes"; update those two words accordingly in the Vamana paragraph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants