Skip to content

[Docs] Convert Sphinx docs to Fern#2067

Open
cjnolet wants to merge 135 commits into
rapidsai:mainfrom
cjnolet:codex/fern-docs-main-refresh
Open

[Docs] Convert Sphinx docs to Fern#2067
cjnolet wants to merge 135 commits into
rapidsai:mainfrom
cjnolet:codex/fern-docs-main-refresh

Conversation

@cjnolet
Copy link
Copy Markdown
Member

@cjnolet cjnolet commented May 7, 2026

This PR converts the current Sphinx docs to Fern in preparation for the move to docs.nvidia.com. Instead of manually composing the API reference docs, this PR also generates API reference docs for all supported languages directly from the code (as is standard in Fern).

There's a lot of files in this PR, and most of the markdown files are either copied over to the Fern directory format from the old Sphinx docs, or they've been auto-generated using the new API reference docs generation scripts (generate_api_reference.py in the changes). When reviewing this PR, it's probably better to start with the non-markdown files, then build the docs and run them locally.

The docs can be built in the usual way with ./build.sh docs. You can run them locally using the following command:

fern/build_docs.sh dev --port 3000 --backend-port 3001

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cjnolet cjnolet self-assigned this May 7, 2026
@cjnolet cjnolet added doc Improvements or additions to documentation non-breaking Introduces a non-breaking change labels May 7, 2026
@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing May 7, 2026
@cjnolet cjnolet marked this pull request as ready for review May 7, 2026 04:04
@cjnolet cjnolet requested review from a team as code owners May 7, 2026 04:04
@cjnolet cjnolet requested a review from bdice May 7, 2026 04:04
@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 7, 2026

/ok to test 62daba9

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces Sphinx-based docs with a Fern site: adds Fern config, pages, and build script; removes many Sphinx/RST assets and a Sphinx extension; updates build and CI to invoke Fern; updates doc-related conda dependencies and envs; adds a CODEOWNERS rule for fern/. (50 words)

Changes

Documentation System Migration

Layer / File(s) Summary
Config / Ownership
.github/CODEOWNERS, fern/fern.config.json, fern/docs.yml
Add CODEOWNERS entry for fern/; add Fern site config and site-level settings/navigation.
Content Addition (Markdown)
fern/pages/*.md, fern/pages/c_api/*, fern/pages/cpp_api/*
Add many Fern markdown pages: API reference (C/C++/multi‑GPU/etc.), guides (build, interoperability, API basics, advanced topics, cuvs_bench) and other prose.
New Build Script / Modes
fern/build_docs.sh
Add Fern build wrapper script implementing modes (check, preview, publish, dev) and API reference generation.
CI & Build Integration
ci/build_docs.sh, build.sh, .github/workflows/pr.yaml, .github/workflows/check-c-abi.yaml, ci/release/update-version.sh
ci/build_docs.sh: switch to dependency-generator output and invoke Fern; build.sh: set FERN_DOCS_DIR and call fern/build_docs.sh (mode ${FERN_DOCS_MODE:-check}) and adjust PARALLEL_LEVEL defaulting; workflows: add !fern/** to changed-files ignore lists and update PR doc link; release script: SPDX year bump and update doc rewrite targets to Fern pages.
Dependency / Environment Changes
dependencies.yaml, conda/environments/all_cuda-*/...yaml
Remove Sphinx/Doxygen/breathe and related doc tooling from docs dependency lists; add nodejs; generated conda envs updated to include gcc_linux-*/go and remove pip nvidia-sphinx-theme.
Removal of Legacy Sphinx Assets
docs/Makefile, docs/source/*, docs/source/sphinxext/github_link.py, docs/source/_static/*
Remove many RST pages, Makefile targets, static Sphinx assets, and the github_link.py extension (linkcode resolver) from docs/source.
Docs Validation / CLI Checks
fern/build_docs.sh, CI invocation points
Introduce Fern validation flow: generate API reference, run fern check --warnings and fern docs md check, then run fern generate/docs dev per mode.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 7, 2026

/ok to test 82655fd

1 similar comment
@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 7, 2026

/ok to test 82655fd

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (40)
fern/pages/c_api/c-api-neighbors-cagra.md-276-276 (1)

276-276: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate heading text causes broken anchors and markdownlint MD024 warnings.

The heading ## C API for CUDA ANN Graph-based nearest neighbor search appears at Lines 276, 370, 486, 651, and 685. Markdown renderers generate identical anchors for duplicate headings; in-page links to any section after the first will silently resolve to the first occurrence. Each instance corresponds to a distinct Doxygen group (cagra_c_extend_params, cagra_c_search_params, cagra_c_index, cagra_c_index_build, cagra_c_index_search) — use those group names as unique headings.

♻️ Proposed fix
-## C API for CUDA ANN Graph-based nearest neighbor search
+## Extend parameters
 
 _Doxygen group: `cagra_c_extend_params`_
-## C API for CUDA ANN Graph-based nearest neighbor search
+## Search parameters
 
 _Doxygen group: `cagra_c_search_params`_
-## C API for CUDA ANN Graph-based nearest neighbor search
+## Index lifecycle
 
 _Doxygen group: `cagra_c_index`_
-## C API for CUDA ANN Graph-based nearest neighbor search
+## Index build
 
 _Doxygen group: `cagra_c_index_build`_
-## C API for CUDA ANN Graph-based nearest neighbor search
+## Index search
 
 _Doxygen group: `cagra_c_index_search`_

Also applies to: 370-370, 486-486, 651-651, 685-685

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-cagra.md` at line 276, Replace each
duplicate "## C API for CUDA ANN Graph-based nearest neighbor search" heading
with a unique heading that includes the corresponding Doxygen group name so
anchors differ; specifically update the four other occurrences to use headings
like "## C API for CUDA ANN Graph-based nearest neighbor search —
cagra_c_extend_params", "… — cagra_c_search_params", "… — cagra_c_index", "… —
cagra_c_index_build", and "… — cagra_c_index_search" (matching the Doxygen group
each section documents) so in-page links and markdownlint MD024 warnings are
resolved.
fern/pages/c_api/c-api-neighbors-refine.md-35-35 (1)

35-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo: n_querisn_queries in the queries parameter description.

🐛 Proposed fix
-| `queries` | in | `DLManagedTensor*` | device matrix of the queries [n_queris, dims] |
+| `queries` | in | `DLManagedTensor*` | device matrix of the queries [n_queries, dims] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-refine.md` at line 35, Fix the typo in the
`queries` parameter description: change "n_queris" to "n_queries" in the table
cell describing `queries` (the DLManagedTensor* device matrix row) so the
dimension reads [n_queries, dims]; update the text in the `queries` parameter
line of the c-api-neighbors-refine documentation accordingly.
fern/pages/c_api/c-api-neighbors-cagra.md-504-510 (1)

504-510: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cagraError_t is a stale type name — should be cuvsError_t.

The function signature declares cuvsError_t cuvsCagraIndexCreate(...), but the rendered returns description says cagraError_t, which appears to be a leftover from before the cuVS type renaming.

🐛 Proposed fix
 **Returns**
 
 `cuvsError_t`
 
-cagraError_t
+cuvsError_t
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-cagra.md` around lines 504 - 510, The
rendered Returns section incorrectly shows the stale type name `cagraError_t`;
update the documentation for the function `cuvsCagraIndexCreate` (and any
associated return docs for cagra functions) to use the correct type
`cuvsError_t` instead of `cagraError_t`, ensuring the Returns block and any
Source annotation reflect `cuvsError_t`.
fern/pages/c_api/c-api-preprocessing-pca.md-43-44 (1)

43-44: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Truncated field descriptions for copy and whiten.

Both descriptions are cut off mid-sentence, leaving the semantics of these fields incomplete for readers.

-| `copy` | `bool` | If false, data passed to fit are overwritten and running fit(X).transform(X) will |
-| `whiten` | `bool` | When true the component vectors are multiplied by the square root of n_samples and then |
+| `copy` | `bool` | If false, data passed to fit are overwritten and running `fit(X).transform(X)` will not work as expected. Pass `False` to save memory. |
+| `whiten` | `bool` | When true, the component vectors are multiplied by the square root of `n_samples` and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances. |

(Fill in with the correct full description from the source header c/include/cuvs/preprocessing/pca.h:35.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-preprocessing-pca.md` around lines 43 - 44, The `copy`
and `whiten` field descriptions in c-api-preprocessing-pca.md are truncated;
replace them with the full sentences taken from the PCA C header comments for
those fields (the header's PCA preprocessing comments) so the docs convey the
complete semantics: for `copy` state that when false the data passed to fit are
overwritten and running fit(X).transform(X) will not yield a copy, and for
`whiten` state that when true the component vectors are multiplied by the square
root of n_samples and then divided by the singular values to ensure uncorrelated
outputs with unit variance; update the `copy` and `whiten` lines in the markdown
table accordingly.
fern/pages/c_api/c-api-neighbors-nn-descent.md-17-17 (1)

17-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Split the run-on parameter description and fix typo.

Line 17 is hard to parse and includes Minkowskidistance typo. Please format it as a short bullet list (one bullet per field note) and use Minkowski distance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-nn-descent.md` at line 17, The parameter
descriptions in c-api-neighbors-nn-descent.md are a run-on line and contain a
typo; split the long inline list into a short bullet list with one bullet per
parameter name (metric, metric_arg, graph_degree, intermediate_graph_degree,
max_iterations, termination_threshold, return_distances, dist_comp_dtype) and
fix the typo "Minkowskidistance" to "Minkowski distance"; ensure each bullet
briefly states the parameter name and its description (e.g., "metric: The
distance metric to use", "metric_arg: The argument used by distance metrics like
Minkowski distance", etc.) and preserve the note about dist_comp_dtype options
(NND_DIST_COMP_AUTO, NND_DIST_COMP_FP32, NND_DIST_COMP_FP16).
fern/pages/c_api/c-api-preprocessing-quantize-binary.md-45-46 (1)

45-46: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the concrete enum type for threshold.

Line 45 currently documents threshold as enum; this should name the concrete type for API clarity.

Proposed fix
-| `threshold` | `enum` |  |
+| `threshold` | `cuvsBinaryQuantizerThreshold` |  |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-preprocessing-quantize-binary.md` around lines 45 -
46, Replace the generic `enum` label for the `threshold` parameter in the table
with the actual concrete enum type used by the API (i.e., the enum type that
defines valid threshold values—replace `enum` with that exact type name, for
example `QuantizeBinaryThreshold`), leaving `sampling_ratio` as `float`; update
the table row for `threshold` so it lists the concrete enum identifier instead
of the word `enum`.
fern/pages/c_api/c-api-neighbors-tiered-index.md-73-77 (1)

73-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Complete truncated cuvsTieredIndexParams field descriptions.

Line 73 through Line 77 contain cut-off descriptions ("create an", "if the number"), which leaves key behavior ambiguous.

Proposed fix
-| `min_ann_rows` | `int64_t` | The minimum number of rows necessary in the index to create an |
-| `create_ann_index_on_extend` | `bool` | Whether or not to create a new ann index on extend, if the number |
-| `cagra_params` | `cuvsCagraIndexParams_t` | Optional parameters for building a cagra index |
-| `ivf_flat_params` | `cuvsIvfFlatIndexParams_t` | Optional parameters for building a ivf_flat index |
-| `ivf_pq_params` | `cuvsIvfPqIndexParams_t` | Optional parameters for building a ivf-pq index |
+| `min_ann_rows` | `int64_t` | Minimum number of rows required before creating an ANN index. |
+| `create_ann_index_on_extend` | `bool` | Whether to create a new ANN index during extend when row count conditions are met. |
+| `cagra_params` | `cuvsCagraIndexParams_t` | Optional parameters for building a CAGRA index. |
+| `ivf_flat_params` | `cuvsIvfFlatIndexParams_t` | Optional parameters for building an IVF-Flat index. |
+| `ivf_pq_params` | `cuvsIvfPqIndexParams_t` | Optional parameters for building an IVF-PQ index. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md` around lines 73 - 77, The
table entries for cuvsTieredIndexParams have truncated descriptions; update the
rows for the fields min_ann_rows, create_ann_index_on_extend, cagra_params,
ivf_flat_params, and ivf_pq_params inside cuvsTieredIndexParams so each has a
complete sentence explaining behavior — e.g., min_ann_rows: "The minimum number
of rows required in the dataset before creating an ANN index.";
create_ann_index_on_extend: "Whether to automatically create a new ANN index
when the dataset is extended past min_ann_rows."; cagra_params: "Optional
parameters used when building a cagra index (cuvsCagraIndexParams_t).";
ivf_flat_params: "Optional parameters used when building an ivf_flat index
(cuvsIvfFlatIndexParams_t)."; ivf_pq_params: "Optional parameters used when
building an ivf-pq index (cuvsIvfPqIndexParams_t)." Ensure these full
descriptions replace the truncated text in the cuvsTieredIndexParams table.
fern/pages/c_api/c-api-neighbors-all-neighbors.md-114-121 (1)

114-121: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clean up malformed build behavior text and res description.

Line 114 and Line 120 read as partially broken text (resources ... / incomplete res description), which makes the API contract harder to follow.

Proposed fix
-resources The function automatically detects whether the dataset is host-resident or device-resident and calls the appropriate implementation. For host datasets, it partitions data into `n_clusters` clusters and assigns each row to `overlap_factor` nearest clusters. For device datasets, `n_clusters` must be 1 (no batching); `overlap_factor` is ignored. Outputs always reside in device memory.
+The function automatically detects whether the dataset is host-resident or device-resident and calls the appropriate implementation. For host datasets, it partitions data into `n_clusters` clusters and assigns each row to `overlap_factor` nearest clusters. For device datasets, `n_clusters` must be 1 (no batching) and `overlap_factor` is ignored. Outputs always reside in device memory.
@@
-| `res` | in | `cuvsResources_t` | Can be a SNMG multi-GPU resources (`cuvsResources_t`) or single-GPU |
+| `res` | in | `cuvsResources_t` | Resource handle for SNMG multi-GPU or single-GPU execution. |
fern/pages/c_api/c-api-cluster-kmeans.md-64-72 (1)

64-72: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct parameter direction for cuvsKMeansParamsCreate.

params is documented as in, but this API allocates/populates it through a pointer. Direction should be out (or inout) for accurate C API semantics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-cluster-kmeans.md` around lines 64 - 72, The parameter
direction for cuvsKMeansParamsCreate is incorrect: update the documentation for
the function cuvsKMeansParamsCreate to mark the `params` parameter as `out` (or
`inout` if the function both reads and writes the pointed-to struct) instead of
`in`, since the function allocates/populates the cuvsKMeansParams_t through the
pointer; change the table entry for `params` accordingly to reflect correct C
API semantics.
fern/pages/cpp_api/cpp-api-neighbors-ball-cover.md-21-29 (1)

21-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Parameter documentation is incomplete/truncated.

index description ends with “instance of” and does not describe the expected type/state completely. Please complete this sentence so callers understand required preconditions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-ball-cover.md` around lines 21 - 29, The
`index` parameter description is truncated; update the documentation for
cuvs::neighbors::ball_cover::index to fully describe `index` (type:
index<int64_t, float>&) by stating that it must be an empty,
default-constructed, not-previously-built instance (no existing data or built
index state) that the function will populate/bind during the build operation,
and mention any required preconditions (e.g., capacity/reservation or that
memory/resources must be uninitialized) so callers know the expected state
before calling.
.github/workflows/check-c-abi.yaml-130-130 (1)

130-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use an absolute link in PR comments; current relative path is not reliable.

../fern/pages/c_api/index.md in an issue comment body is likely to resolve incorrectly for readers. Use a full GitHub blob URL (or the published docs URL) instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/check-c-abi.yaml at line 130, The PR comment contains a
relative link string "../fern/pages/c_api/index.md" which may resolve
incorrectly for readers; replace that relative path with an absolute URL (either
the GitHub blob URL like
"https://github.com/<OWNER>/<REPO>/blob/<BRANCH>/fern/pages/c_api/index.md" or
the published docs URL) so the comment always points to the correct
resource—search for the literal "../fern/pages/c_api/index.md" in the
workflow/comment text and substitute the absolute link.
fern/pages/cpp_api/cpp-api-neighbors-brute-force.md-47-47 (1)

47-47: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling typos in API descriptions.

precaculated and Whether ot not should be corrected (precalculated, Whether or not) to keep generated reference text clean.

Also applies to: 317-317

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 47, Fix typos in
the API description text for the brute-force neighbors index: replace
"precaculated" with "precalculated" and "Whether ot not" with "Whether or not"
in the description string that begins "Constructs a brute force index from a
dataset..." (also update the same typos at the other occurrence around line 317
in the same document cpp-api-neighbors-brute-force.md) so generated reference
text is correct.
fern/pages/api_basics.md-45-46 (1)

45-46: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a Markdown heading here.

C followed by ^ is reStructuredText syntax, so this won't render as a section heading in Fern.

Suggested fix
-C
-^
+### C
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/api_basics.md` around lines 45 - 46, Replace the reStructuredText
snippet "C" followed by "^" with a proper Markdown heading: locate the literal
lines containing "C" and the caret marker and convert them into a Markdown
section header (e.g., prefix the title with one or more '#' characters such as
"## C" or a more descriptive heading) so the section renders correctly in Fern;
ensure you remove the caret line after adding the Markdown heading.
fern/pages/c_api/c-api-neighbors-ivf-flat.md-52-54 (2)

52-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate cuvsError_t rendering artifact throughout

Across many sections, the Returns block emits the return type twice — once formatted in backticks and once as bare text:

`cuvsError_t`

cuvsError_t

For example, lines 52–54 (cuvsIvfFlatIndexParamsCreate), lines 74–76 (cuvsIvfFlatIndexParamsDestroy), lines 116–118, 138–140, 164–166, 249–251, 283–285, and 408–411. The bare trailing text will render as stray paragraph content on the published page. This should be removed at the doc-generator template level.

Also applies to: 74-76, 116-118, 138-140, 164-166, 249-251, 283-285, 408-411

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md` around lines 52 - 54, The
documentation generator is emitting the return type twice (once as inline code
and once as bare text) causing stray paragraphs; update the template that
renders the Returns block to output the return type only once by removing the
redundant bare-text emission (for example, fix the Returns rendering logic used
by functions like cuvsIvfFlatIndexParamsCreate and cuvsIvfFlatIndexParamsDestroy
and other affected symbols such as the functions at lines referenced) so the
template emits either `cuvsError_t` (inline) or the bare type but not both.

52-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stray bare cuvsError_t after every formatted Returns entry (rendering artifact)

In at least 8 sections (cuvsIvfFlatIndexParamsCreate, Destroy, SearchParamsCreate, SearchParamsDestroy, IndexCreate, GetCenters, Build, Extend), the Returns block emits the return type twice — once formatted and once as a bare paragraph:

`cuvsError_t`

cuvsError_t

The trailing plain text will render as an orphaned paragraph on the published page. This should be removed at the doc-generator template level.

Also applies to: 74-76, 116-118, 138-140, 164-166, 249-251, 283-285, 408-411

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md` around lines 52 - 54, The
generated docs are emitting the return type twice (once as inline code and once
as a bare paragraph) for several functions (cuvsIvfFlatIndexParamsCreate,
cuvsIvfFlatIndexParamsDestroy, cuvsIvfFlatSearchParamsCreate,
cuvsIvfFlatSearchParamsDestroy, cuvsIvfFlatIndexCreate, cuvsIvfFlatGetCenters,
cuvsIvfFlatBuild, cuvsIvfFlatExtend); update the doc-generator template that
renders the Returns block to stop appending the raw return-type text after the
formatted code span—remove the extra raw/text emission or collapse it so only
the single formatted `cuvsError_t` is emitted for each Returns entry (ensure
change is applied in the template handling of return-type rendering rather than
editing individual docs).
fern/pages/c_api/c-api-distance-pairwise-distance.md-26-26 (2)

26-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dangling "Usage example:" with no content

Line 26 emits Usage example: but provides no actual example. This is a recurring generation artifact across multiple pages in this PR. Either include the example or remove the label to avoid confusing users.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-distance-pairwise-distance.md` at line 26, The doc
contains a dangling heading "Usage example:" with no example content; either add
a concrete usage snippet demonstrating the pairwise distance API (showing call
pattern, parameters, and expected result) under the "Usage example:" label in
c-api-distance-pairwise-distance.md or remove the "Usage example:" line entirely
so the page doesn't promise missing content; search other generated pages for
the same "Usage example:" literal and apply the same fix consistently.

26-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dangling Usage example: label with no content

Line 26 emits Usage example: but nothing follows it before the Parameters table. This is a doc-generation artifact (the Doxygen @usage_example tag rendered but no content was authored). Either add the example or remove the label to avoid confusing readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-distance-pairwise-distance.md` at line 26, Remove the
dangling "Usage example:" label or replace it with a real usage snippet for the
related pairwise-distance API: locate the "Usage example:" line in the
c-api-distance-pairwise-distance.md doc (immediately before the Parameters
table) and either delete that heading or add a short, concrete example showing
how to call the pairwise-distance function (including expected inputs and a
brief result) so the section is not empty.
fern/pages/cpp_api/cpp-api-cluster-kmeans.md-22-39 (2)

22-39: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cuvs::cluster::kmeans::params fields table is malformed — enum enumerators rendered as struct fields

The Fields table for params (lines 25–27, 33) renders InitMethod enumerators (KMeansPlusPlus,, Random,, Array) and a field default initializer (raft::random::RngState rng_state{0}) in both the Name and Type columns with identical text:

| `KMeansPlusPlus,` | `KMeansPlusPlus,` | Sample the centroids ... |
| `Random,`         | `Random,`         | ... |
| `Array`           | `Array`           | ... |
| `raft::random::RngState rng_state{0}` | `raft::random::RngState rng_state{0}` | Seed ... |

This is clearly a doc-generator artifact where the InitMethod nested enum values are being treated as struct fields. The Name and Type should differ; InitMethod enumerators should be in a separate Values sub-table, and rng_state should show the field name separately from its type.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md` around lines 22 - 39, The
params fields table for cuvs::cluster::kmeans::params is malformed: the nested
enum InitMethod's enumerators (KMeansPlusPlus, Random, Array) were rendered as
struct fields and rng_state shows its initializer in both Name and Type columns.
Fix the Markdown so the params table lists actual struct fields (e.g., a row
with Name: init, Type: InitMethod, Description: Method for initialization), add
a separate "Values" sub-table for InitMethod containing the enumerators
KMeansPlusPlus, Random, Array with their descriptions, and change the rng_state
row to show Name: rng_state and Type: raft::random::RngState (omit the
initializer from the Type column and put default value in the Description or a
Default column).

22-39: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cuvs::cluster::kmeans::params fields table is malformed — InitMethod enumerators rendered as struct fields with identical Name and Type

The Fields table renders three InitMethod enum values and a field default-initializer in both the Name and Type columns:

| `KMeansPlusPlus,`                     | `KMeansPlusPlus,`                     | ... |
| `Random,`                             | `Random,`                             | ... |
| `Array`                               | `Array`                               | ... |
| `raft::random::RngState rng_state{0}` | `raft::random::RngState rng_state{0}` | ... |

A struct field's Name and Type should never be the same string. The InitMethod enumerators belong in a separate nested Values sub-table, and rng_state should separate its field name from its type raft::random::RngState (with default {0} noted separately).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md` around lines 22 - 39, The
fields table for cuvs::cluster::kmeans::params is malformed: the InitMethod
enumerators (KMeansPlusPlus, Random, Array) are rendered as struct fields
instead of enum values and rng_state shows name and type combined; fix by
removing KMeansPlusPlus/Random/Array rows from the main Fields table and instead
add a nested "Values" sub-table under the InitMethod entry listing those enum
members, and change the rng_state row to separate the field name (rng_state)
from its type (raft::random::RngState) with the default initializer {0} noted in
a Default or Description column.
fern/pages/c_api/c-api-neighbors-mg-common.md-13-21 (2)

13-21: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cuvsMultiGpuDistributionMode is undocumented (no Values table)

Unlike cuvsMultiGpuReplicatedSearchMode and cuvsMultiGpuShardedMergeMode, the cuvsMultiGpuDistributionMode section contains only the typedef enum { ... } stub with no Values table. Users have no way to discover valid enumerators from the reference page. If the enum values are finalized in the source header, the Values table should be added before this goes live.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-mg-common.md` around lines 13 - 21, The
cuvsMultiGpuDistributionMode enum is missing its Values table in the generated
docs; update the documentation for cuvsMultiGpuDistributionMode to include a
Values section listing each enumerator and its description pulled from the
source header (c/include/cuvs/neighbors/mg_common.h), ensuring each enum member
name and meaning matches the finalized definitions in that header so users can
discover valid options.

13-21: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

cuvsMultiGpuDistributionMode is documented as an opaque stub — no enumerator values

Unlike cuvsMultiGpuReplicatedSearchMode and cuvsMultiGpuShardedMergeMode, which both include a Values table, the cuvsMultiGpuDistributionMode section (lines 13–21) only shows typedef enum { ... } with no values listed. Users cannot determine valid enumerators from the reference page. If these values are finalized in the header, the Values table should be added before this page goes live.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-mg-common.md` around lines 13 - 21, The
documentation for cuvsMultiGpuDistributionMode is missing its enumerator values;
locate the enum definition in the header (c/include/cuvs/neighbors/mg_common.h)
for cuvsMultiGpuDistributionMode and update the docs to include a "Values" table
mirroring the format used by cuvsMultiGpuReplicatedSearchMode and
cuvsMultiGpuShardedMergeMode, listing each enumerator name and its description
exactly as declared in the header.
fern/pages/cpp_api/cpp-api-neighbors-common.md-63-79 (2)

63-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Empty and missing enum values across three enum sections

Three issues:

  1. filtering::FilterType (lines 75–77): All three enumerators (None, Bitmap, Bitset) show an empty Value column. If these are integer constants they should be filled in.

  2. ivf::distribution_mode (lines 195–201): No Values table at all — same incomplete-stub pattern as cuvsMultiGpuDistributionMode in c-api-neighbors-mg-common.md. Users have no way to discover valid enumerators.

  3. ivf::replicated_search_mode / ivf::sharded_merge_mode (lines 213–235): Enumerators are listed but their Value column is empty, while the corresponding C API page (c-api-neighbors-mg-common.md) correctly documents LOAD_BALANCER=0, ROUND_ROBIN=1, MERGE_ON_ROOT_RANK=0, TREE_MERGE=1. This inconsistency will confuse readers comparing the two APIs.

Also applies to: 195-201, 213-235

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-common.md` around lines 63 - 79, Fill in
the missing enum values by reading the actual C++ enum definitions and updating
the docs: for filtering::FilterType, populate the Values column with each
enumerator's underlying integer (or explicit value) from
cpp/include/cuvs/neighbors/common.hpp; for ivf::distribution_mode add a complete
Values table listing each enumerator and its numeric value; for
ivf::replicated_search_mode and ivf::sharded_merge_mode fill the Value column
with the corresponding integers (ensure they match the C API equivalents like
LOAD_BALANCER=0 / ROUND_ROBIN=1 and MERGE_ON_ROOT_RANK=0 / TREE_MERGE=1 where
applicable) so the C++ docs are consistent with c-api-neighbors-mg-common.md.

63-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Empty enum values and incomplete ivf::distribution_mode section

Three issues with enum documentation:

  1. filtering::FilterType (lines 75–77): All three enumerators show an empty Value column. If these are meaningful integer constants, their values should be filled in.

  2. ivf::distribution_mode (lines 195–201): No Values table at all — same stub pattern as cuvsMultiGpuDistributionMode in c-api-neighbors-mg-common.md. Users have no way to know the valid values.

  3. ivf::replicated_search_mode / ivf::sharded_merge_mode (lines 213–219, 228–235): The enumerators are listed but their Value column is empty, while the corresponding C API page (c-api-neighbors-mg-common.md) correctly documents LOAD_BALANCER=0, ROUND_ROBIN=1, MERGE_ON_ROOT_RANK=0, TREE_MERGE=1. The inconsistency is confusing.

Also applies to: 195-201, 213-219, 228-235

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-common.md` around lines 63 - 79, The
enum docs are missing integer values: open the C++ header declarations (e.g.
cuvs/neighbors/common.hpp) and fill the documentation tables for
filtering::FilterType, ivf::distribution_mode, ivf::replicated_search_mode, and
ivf::sharded_merge_mode with the exact enumerator names and their integer values
as declared in the header (for example ensure replicated_search_mode lists
LOAD_BALANCER=0 and ROUND_ROBIN=1 and sharded_merge_mode lists
MERGE_ON_ROOT_RANK=0 and TREE_MERGE=1 to match the C API); update
filtering::FilterType values from the header as well and add a Values table for
ivf::distribution_mode reflecting the header's enumerators and values to keep
the pages consistent.
fern/pages/c_api/c-api-neighbors-ivf-flat.md-198-209 (2)

198-209: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing direction and description for GetNLists / GetDim parameters

Both cuvsIvfFlatIndexGetNLists (lines 200–203) and cuvsIvfFlatIndexGetDim (lines 222–224) document their parameters with empty Direction and Description columns:

| `index`   |  | `cuvsIvfFlatIndex_t` |  |
| `n_lists` |  | `int64_t*`           |  |

At minimum, n_lists and dim are output parameters (out) and should say so — this distinction matters for callers managing memory.

Also applies to: 220-230

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md` around lines 198 - 209, Update
the parameter tables for cuvsIvfFlatIndexGetNLists and cuvsIvfFlatIndexGetDim to
mark the parameters' Direction and Description: set `n_lists` and `dim` as out
parameters (Direction: out) and describe them as "pointer to receive the number
of lists" and "pointer to receive the index dimensionality" respectively; also
document `index` as an input parameter (Direction: in) with a brief description
like "IVF flat index handle" to clarify ownership and memory responsibilities
for callers.

198-209: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing Direction and Description for accessor function parameters

cuvsIvfFlatIndexGetNLists (lines 200–203) and cuvsIvfFlatIndexGetDim (lines 222–224) document their parameters with entirely blank Direction and Description columns:

| `index`   |  | `cuvsIvfFlatIndex_t` |  |
| `n_lists` |  | `int64_t*`           |  |

At minimum, n_lists and dim are output parameters and should be marked out. Without direction metadata, callers must inspect the header to understand ownership/semantics.

Also applies to: 220-230

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md` around lines 198 - 209, The
parameter table for the accessor functions cuvsIvfFlatIndexGetNLists and
cuvsIvfFlatIndexGetDim is missing Direction and Description; update the
parameter rows so that `n_lists` and `dim` are marked as `out` and provide short
descriptions (e.g., "output pointer receiving number of lists" and "output
pointer receiving dimensionality"), and ensure `index` rows indicate direction
(likely `in`) and a brief description (e.g., "IVF-flat index handle"); apply the
same fixes for both function parameter tables so callers know
ownership/semantics.
fern/pages/cpp_api/cpp-api-neighbors-common.md-81-97 (2)

81-97: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Ambiguous ### filtering::operator headings — three sections with identical names

Lines 81, 111, and 133 all produce the heading ### filtering::operator, which is an incomplete identifier. These are clearly different operator() overloads (one with query_ix/cluster_ix/sample_ix, one three-arg, one two-arg), but the heading gives no way to distinguish them. Consider qualifying them:

  • ### filtering::operator() (query_ix, cluster_ix, sample_ix)
  • ### filtering::operator() — two-argument form

Without disambiguation, anchor-based links and in-page navigation are broken for all three.

Also applies to: 111-131, 133-147

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-common.md` around lines 81 - 97, The
three identical headings named "filtering::operator" should be made unique by
including the operator signature or parameter hints in the heading so anchors
and navigation are distinct; update each heading for the different overloads of
filtering::operator (e.g., "filtering::operator() (query_ix, cluster_ix,
sample_ix)", "filtering::operator() — three-argument form", and
"filtering::operator() — two-argument form") and ensure the Markdown headings
that reference the symbol operator() in the file are changed accordingly so each
overload maps to a unique anchor.

81-97: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Three sections titled ### filtering::operator — ambiguous and breaks anchor navigation

Lines 81, 111, and 133 all produce the identical heading ### filtering::operator. These document three distinct operator() overloads (three-arg with cluster_ix, three-arg wrapper, two-arg form), but the heading gives no way to distinguish them. All three will share the same anchor ID, breaking in-page navigation and deep-linking. Suggest qualifying the heading, e.g.:

  • ### filtering::operator() (query_ix, cluster_ix, sample_ix)
  • ### filtering::operator() — two-argument form

Also applies to: 111-131, 133-147

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-common.md` around lines 81 - 97, The
three identical headings for filtering::operator collide and break anchors;
update each heading for the distinct operator() overloads in the document so
they are unique and descriptive — e.g., change the first filtering::operator
heading to include the full signature or params like "filtering::operator()
(query_ix, cluster_ix, sample_ix)", the second to "filtering::operator()
(query_ix, cluster_ix, sample_ix) — wrapper" (or include the exact signature
shown), and the third to "filtering::operator() — two-argument form (query_ix,
sample_ix)"; locate the sections that document filtering::operator / operator()
(the entries referencing cpp/include/cuvs/neighbors/common.hpp and the
operator() signatures) and update their Markdown headings accordingly so each
generates a unique anchor.
fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md-41-41 (2)

41-41: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Truncated field/parameter descriptions and prose fragmented by code blocks

Several descriptions are cut off mid-sentence:

  • Line 41n_clusters description ends with "If set to 1, algorithm creates" (rest is missing)
  • Line 76 / 111core_distances description ends with "Requires distances matrix to" (rest is missing)

Both build overload descriptions are also split by their code blocks:

"Builds an approximate all-neighbors knn graph (find nearest neighbors for all the"[code block]"training vectors) Usage example: compute core_distances..."

This makes the prose incoherent for any reader who scans the description text. These are Doxygen brief/detail split artifacts. The fix should apply at the source-comment level so the full description renders either fully before or fully after the signature.

Also applies to: 52-65, 76-76, 87-100, 111-111

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md` at line 41, The doc
comments split by code blocks are truncating parameter and prose (e.g., the
parameter descriptions for n_clusters and core_distances and the two build
overload descriptions), so fix the source-level Doxygen comments by moving full
sentences either entirely before the function signature (use a complete `@brief`)
or entirely after it (use `@details`), avoiding inserting code blocks between a
brief and its detail; update the comments for the functions/overloads named
build and the parameter docs for n_clusters and core_distances so each
description is a single uninterrupted paragraph (or use explicit `@param` tags
with complete text) to prevent the renderer from fragmenting the prose.

41-41: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Multiple truncated and split descriptions throughout the page

Several field/parameter descriptions are cut off mid-sentence:

  • Line 41n_clusters description ends abruptly: "If set to 1, algorithm creates"
  • Line 76 / 111core_distances description ends abruptly: "Requires distances matrix to"

Additionally, both build overload descriptions are split by their code blocks — the sentence begins above the block and concludes below it, producing incoherent prose for readers who skip the signature:

"Builds an approximate all-neighbors knn graph (find nearest neighbors for all the" [code] "training vectors) Usage example: ..."

These are likely doc-generator artifacts (Doxygen brief/detail split with code insertion). The fix should be applied at the Doxygen comment level so the complete description appears either fully before or fully after the signature.

Also applies to: 52-65, 76-76, 87-100, 111-111

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md` at line 41, The
parameter descriptions are being split by the generated signature/code blocks;
fix the Doxygen comments so each full sentence appears either entirely before or
entirely after the function signature to avoid truncation — specifically update
the Doxygen comment for the parameter n_clusters (complete its sentence so it no
longer ends with "If set to 1, algorithm creates") and for core_distances
(complete "Requires distances matrix to..."), and consolidate the descriptive
paragraph for both overloads of build so the prose ("Builds an approximate
all-neighbors knn graph...") is contiguous rather than split by the code block;
adjust the brief/detail placement in the Doxygen block for the build method and
the affected parameter tags to ensure the full descriptions render together.
fern/pages/cpp_api/cpp-api-cluster-kmeans.md-96-96 (2)

96-96: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Internal TODO comment is leaking into the public API reference

Line 96 contains a development note that should never appear in published documentation:

TODO: Evaluate replacing the extent type with int64_t. Reference issue: #1961

This text originates from a Doxygen comment in the source header. It should be removed from the header comment (or excluded by the doc generator) before this page goes live.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md` at line 96, Remove the internal
TODO text ("TODO: Evaluate replacing the extent type with int64_t. Reference
issue: https://github.com/rapidsai/cuvs/issues/1961") from the public Doxygen
comment for the overload that “supports out-of-core computation” (the header
comment describing params.streaming_batch_size); either delete the TODO line in
that header comment or mark it with a doc-generator exclusion tag so it is not
emitted into the generated cpp-api-cluster-kmeans.md documentation.

96-96: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Internal TODO comment is leaking into the public API reference

Line 96 contains a development note that must not appear in published documentation:

TODO: Evaluate replacing the extent type with int64_t. Reference issue: #1961

This text originates from a Doxygen comment in the source header and has been carried verbatim into the generated page. Remove or exclude it (via a Doxygen tag filter or a @internal annotation) before this site goes live.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md` at line 96, The generated doc
page contains an internal TODO ("Evaluate replacing the extent type with
int64_t. Reference issue: https://github.com/rapidsai/cuvs/issues/1961") that
must be removed from public output; locate the corresponding Doxygen comment
that documents the KMeans overload (the comment referencing "extent type" /
"streaming_batch_size" in the KMeans cluster API) and either delete the TODO
text or mark it as internal (e.g., add a `@internal` or `@cond/`@endcond around the
note) or configure a Doxygen tag filter to exclude such TODO lines, then
regenerate docs so the TODO no longer appears on the cpp-api-cluster-kmeans
page.
fern/pages/cpp_api/cpp-api-distance-distance.md-27-27 (1)

27-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dangling Usage example: labels with no content (recurring generation artifact)

Seven of the eight overloads in this file (and the C API pairwise distance page) end their description with Usage example: immediately before the Parameters table, with no example code following. This is a Doxygen tag rendered verbatim by the generator. Either populate the examples or remove the label so users aren't left looking for content that isn't there.

Also applies to: 60-60, 93-93, 126-126, 159-159, 192-192, 224-224, 256-256

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-distance-distance.md` at line 27, Several
overloads contain a dangling literal "Usage example:" label immediately before
their Parameters table with no example content; remove each empty "Usage
example:" line (or alternatively add a real code example block) for every
overload in this distance API doc and the corresponding C API pairwise distance
page so the Doxygen output doesn't render an empty tag; search for the exact
string "Usage example:" in the distance docs and update the surrounding overload
sections (the overloaded distance functions) accordingly.
fern/pages/cpp_api/cpp-api-cluster-kmeans.md-284-316 (1)

284-316: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix inertia parameter type for int8_t fit overload — should be float not int8_t

The int8_t fit overload at line 480 documents inertia as raft::host_scalar_view<int8_t>, but this is incorrect. Inertia represents the sum of squared distances to the nearest cluster center, which would overflow int8_t (range: -128 to 127) for nearly any real dataset. All other overloads correctly use the corresponding numeric type for inertia: float for the float overload (line 278) and double for the double overload (line 430). Correct the type to raft::host_scalar_view<float>.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md` around lines 284 - 316, The doc
for the cuvs::cluster::kmeans::fit overload that takes int8_t data incorrectly
lists the inertia parameter as raft::host_scalar_view<int8_t>; update that
overload's signature and parameter table so inertia is
raft::host_scalar_view<float> (match numeric type used for inertia in other
overloads), i.e., locate the cuvs::cluster::kmeans::fit int8_t overload and
change the inertia type and its description to reflect float.
fern/pages/build.md-29-30 (1)

29-30: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix broken table-of-contents anchor.

Line 29 links to #build-documentation, but no matching heading exists; the current section is “Preview documentation” at Line 226.

Suggested fix
-- [Build Documentation](`#build-documentation`)
+- [Preview documentation](`#preview-documentation`)

Also applies to: 226-227

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/build.md` around lines 29 - 30, The table-of-contents link "[Build
Documentation]" points to a non-existent anchor "#build-documentation"; update
the link target to match the actual heading slug (e.g., change the anchor to
"#preview-documentation" or rename the heading to "Build Documentation") so the
link resolves; update the link text or the heading title consistently (reference
the TOC link "[Build Documentation]" and the "Preview documentation" heading) to
ensure the anchor and heading match.
fern/pages/c_api/c-api-neighbors-hnsw.md-17-29 (1)

17-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include NONE in cuvsHnswHierarchy values table.

Line 17 documents NONE, but the table omits it, making the enum docs inconsistent.

Suggested fix
 | Name | Value |
 | --- | --- |
+| `NONE` | `0` |
 | `CPU` | `1` |
 | `GPU` | `2` |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-hnsw.md` around lines 17 - 29, The
documentation for the enum cuvsHnswHierarchy is inconsistent: add an entry for
NONE to the "Values" table to match the NOTE above; update the table under the
cuvsHnswHierarchy enum so it includes a row for `NONE` with the appropriate
value (e.g., `0` or whatever the enum defines) and a short description
indicating it builds a base-layer-only HNSW index so the enum table matches the
surrounding text.
fern/pages/c_api/c-api-core-c-api.md-426-427 (1)

426-427: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicated wording in memory-resource description.

Line 426 repeats “available memory” and reads as malformed text.

Suggested fix
-bypass unnecessary synchronizations by allocating a chunk of device memory up front and carving that up for temporary memory allocations within algorithms. Be aware that this function will change the memory resource for the whole process and the new memory resource will be used until explicitly changed. available memory available memory
+bypass unnecessary synchronizations by allocating a chunk of device memory up front and carving that up for temporary memory allocations within algorithms. Be aware that this function will change the memory resource for the whole process and the new memory resource will be used until explicitly changed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-core-c-api.md` around lines 426 - 427, The
documentation sentence in the c-api-core description contains
duplicated/malformed wording ("available memory available memory"); edit the
paragraph that starts with "bypass unnecessary synchronizations by allocating a
chunk of device memory up front..." to remove the repeated phrase and correct
the sentence so it reads smoothly (e.g., end with "available memory" only once
and ensure the note about changing the process memory resource remains a single
coherent sentence). Locate that exact text fragment to make the change.
fern/pages/c_api/c-api-neighbors-vamana.md-169-170 (1)

169-170: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed metric sentence in cuvsVamanaBuild description.

The text on Line 169 reads as partially broken (“supported: - L2 Usage example:”), which makes the supported-metrics section unclear. Please format this as a proper list or sentence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-vamana.md` around lines 169 - 170, The
sentence describing supported distance metrics in the cuvsVamanaBuild
documentation is malformed ("supported: - L2 Usage example:"); update the
cuvsVamanaBuild doc so the metrics are clearly listed or phrased as a proper
sentence (e.g., "The following distance metrics are supported: L2, ...") and
ensure "Usage example:" begins on a new line/section; adjust the surrounding
text that references index_params and the supported metrics so the list
formatting or punctuation is consistent and readable.
fern/pages/c_api/c-api-preprocessing-quantize-scalar.md-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document quantile semantics in the params table.

Line 25 leaves the quantile description blank, so users can’t infer valid range/behavior from the reference page. Please add a concise definition (and expected range if applicable).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-preprocessing-quantize-scalar.md` at line 25, Add a
concise description for the `quantile` parameter in the params table on the
c-api-preprocessing-quantize-scalar page: explain that `quantile` is the target
quantile (e.g., 0.0–1.0) used to compute the scalar clipping threshold, state
the valid range (inclusive 0.0 to 1.0), and briefly note behavior for edge
values (0 => min, 1 => max) or rounding semantics if applicable so callers know
expected inputs and effects.
fern/pages/c_api/c-api-preprocessing-quantize-pq.md-32-33 (1)

32-33: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix truncated description for max_train_points_per_pq_code.

On Line 32, the description cuts off at “Using more data”, which leaves the field behavior unclear in the rendered API table. Please complete the sentence so users understand the tradeoff.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-preprocessing-quantize-pq.md` around lines 32 - 33,
The table row for the field max_train_points_per_pq_code has a truncated
description; update the description for max_train_points_per_pq_code to finish
the sentence and explain the tradeoff (e.g., that using more data improves PQ
codebook quality/accuracy but increases training time and memory usage), so
readers understand the behavior and cost of increasing this uint32_t parameter.
fern/pages/cpp_api/cpp-api-cluster-spectral.md-49-59 (1)

49-59: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Repair broken parameter descriptions around label ranges.

Lines 49 and 77 contain dangling text (n_clusters-1)), and the labels descriptions on Lines 58/86/114 are cut off (“0 to …”). This renders incomplete API guidance for all fit_predict overloads.

Also applies to: 77-87, 105-115

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-spectral.md` around lines 49 - 59, The
docs have broken parameter text and truncated label ranges for the spectral
clustering fit_predict overloads; remove the stray dangling "n_clusters-1)"
lines and update each labels parameter description to read:
"raft::device_vector_view<int, int> | Device vector of size n_samples to store
cluster assignments (0 to n_clusters-1)". Apply this change to every fit_predict
overload's parameter table (the entries referencing `labels`,
`raft::device_vector_view<int, int>`, and the corresponding tables near the
`fit_predict`/`params`/`connectivity_graph` sections) so the label range is
complete and the stray fragment is deleted.

Comment thread ci/build_docs.sh
Comment thread fern/pages/api_interoperability.md
Comment thread fern/pages/build.md
Comment thread fern/pages/c_api/c-api-core-c-api.md
Comment thread fern/pages/c_api/c-api-neighbors-hnsw.md Outdated
Comment thread fern/pages/c_api/c-api-neighbors-ivf-pq.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
Comment thread fern/pages/cpp_api/cpp-api-cluster-kmeans.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-neighbors-cagra.md Outdated
@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 7, 2026

/ok to test 16bb8f0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md`:
- Line 317: Fix the typo in the documentation string for the has_norms field by
changing "Whether ot not this index has dataset norms" to "Whether or not this
index has dataset norms" in the cpp-api-neighbors-brute-force.md entry for
has_norms so the description reads correctly.
- Line 47: The docs for the brute force index constructor contain a typo:
"precaculated" should be "precalculated"; update the sentence in the C++ API
neighbor brute-force index constructor description (the line describing
ownership of "precaculated norms") to read "precalculated norms" so the
user-facing API docs and reference output use the correct spelling.
- Line 597: Remove the stray trailing token "output" from the serialization
description sentences in the cpp-api-neighbors-brute-force.md page so each
sentence ends cleanly (e.g., "loading an index saved with a previous version of
cuvs is not guaranteed to work."). Update both occurrences noted in the review
to eliminate the orphaned word and ensure the overload docs read clearly.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md`:
- Around line 1072-1088: The parameter docs for additional_dataset are
incorrect: the function signature uses raft::device_matrix_view<const uint8_t,
int64_t, raft::row_major> (a device/GPU view), but the description says "host
memory"; update the Parameter table entry for additional_dataset to state that
it refers to device (GPU) memory (and keep mentions of row_major/dtype/stride as
appropriate) so the prose matches the signature used by the cagra::extend_params
overload and the additional_dataset symbol.
- Around line 310-313: The constructor snippets for index(raft::resources const&
res, cuvs::distance::DistanceType metric =
cuvs::distance::DistanceType::L2Expanded) and the other overload terminate at
the initializer list comma and are therefore incomplete; update the
generator/emission for the constructors (symbol: index(...)) so the snippet
emits a complete C++ declaration by either closing the initializer list and
providing an empty body "{}" or terminating with a semicolon ";" as appropriate
(e.g., ": cuvs::neighbors::index() {}" or ";"), ensuring both overloads at the
reported locations are fixed.
- Around line 875-876: The "extend" overload documentation contains duplicated,
fragmented paragraphs; fix the two overload descriptions for the extend function
by consolidating and clarifying buffer requirements: for the data/dataset
overload state that num_rows must equal original_rows + additional_rows, cols
must be the data dimensionality, stride must match the original index dataset,
and that the view will be copied into the output index and the caller must keep
the source dataset alive while the index exists; for the graph/adjacency
overload state that num_rows must equal original_rows + additional_rows, cols
must equal the graph degree, the view is copied into the output index and the
caller must keep the source graph buffer alive; remove the duplicated sentences
and apply the same cleanup to the other repeated blocks noted (the other "Usage
example" fragments).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f3af8da5-5d11-4b01-ac97-20fa150942a7

📥 Commits

Reviewing files that changed from the base of the PR and between 82655fd and 1cb8968.

📒 Files selected for processing (24)
  • fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
  • fern/pages/cpp_api/cpp-api-cluster-kmeans.md
  • fern/pages/cpp_api/cpp-api-cluster-spectral.md
  • fern/pages/cpp_api/cpp-api-distance-distance.md
  • fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
  • fern/pages/cpp_api/cpp-api-neighbors-brute-force.md
  • fern/pages/cpp_api/cpp-api-neighbors-cagra.md
  • fern/pages/cpp_api/cpp-api-neighbors-common.md
  • fern/pages/cpp_api/cpp-api-neighbors-dynamic-batching.md
  • fern/pages/cpp_api/cpp-api-neighbors-hnsw.md
  • fern/pages/cpp_api/cpp-api-neighbors-ivf-flat.md
  • fern/pages/cpp_api/cpp-api-neighbors-ivf-pq.md
  • fern/pages/cpp_api/cpp-api-neighbors-nn-descent.md
  • fern/pages/cpp_api/cpp-api-neighbors-refine.md
  • fern/pages/cpp_api/cpp-api-neighbors-scann.md
  • fern/pages/cpp_api/cpp-api-neighbors-vamana.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-binary.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-pq.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-scalar.md
  • fern/pages/cpp_api/cpp-api-preprocessing-spectral-embedding.md
  • fern/pages/cpp_api/cpp-api-selection-select-k.md
  • fern/pages/cpp_api/cpp-api-stats-silhouette-score.md
  • fern/pages/developer_guide.md
  • fern/scripts/generate_api_reference.py
✅ Files skipped from review due to trivial changes (5)
  • fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
  • fern/pages/cpp_api/cpp-api-neighbors-common.md
  • fern/pages/cpp_api/cpp-api-cluster-spectral.md
  • fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
  • fern/pages/cpp_api/cpp-api-distance-distance.md

Comment thread fern/pages/cpp_api/cpp-api-neighbors-brute-force.md
Comment thread fern/pages/cpp_api/cpp-api-neighbors-brute-force.md
Comment thread fern/pages/cpp_api/cpp-api-neighbors-brute-force.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-neighbors-cagra.md
Comment thread fern/pages/cpp_api/cpp-api-neighbors-cagra.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-neighbors-cagra.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

♻️ Duplicate comments (7)
fern/pages/cpp_api/cpp-api-neighbors-cagra.md (4)

973-974: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

extend usage text is corrupted and duplicated across overloads.

The “Usage example” prose is fragmented/repeated and mixes dataset/graph buffer requirements, which makes buffer contracts hard to trust.

Suggested fix (single canonical paragraph, reused per overload)
- part. The data will be copied from the current index in this function. The num rows must be ...
+ If `new_dataset_buffer_view` is provided, `num_rows` must equal
+ `original_rows + additional_rows`, `cols` must equal dataset dimensionality,
+ and `stride` must match the original index dataset. This view is stored in the
+ output index, so the caller must keep its backing memory alive.
+
+ If `new_graph_buffer_view` is provided, `num_rows` must equal
+ `original_rows + additional_rows` and `cols` must equal graph degree.
+ This view is stored in the output index, so the caller must keep its backing
+ memory alive.

Also applies to: 1009-1010, 1045-1046, 1081-1082, 1117-1118, 1153-1154, 1189-1190, 1225-1226

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` around lines 973 - 974, The
"extend" usage text in the cpp-api-neighbors-cagra.md overloads is
duplicated/corrupted; replace the repeated fragments in the Usage example for
each extend overload with a single canonical paragraph that states: the
data/graph buffer will be copied from the current index into the output index,
num_rows must equal the sum of the original and additional datasets, cols must
be the dataset dimension for vector-data overloads (or the graph degree for
graph-buffer overloads), the stride must match the original index dataset
stride, the view will be stored in the output index, and the caller is
responsible for keeping the provided dataset/graph buffer alive for the lifetime
of the index (use this same paragraph for every extend overload's Usage example
to remove duplication and fix the corrupted prose).

316-319: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Constructor snippets are still emitted as incomplete C++ declarations.

Both constructor snippets terminate at an initializer-list comma and never close, so the API reference is syntactically invalid and misleading.

Suggested fix (generator/output)
 index(raft::resources const& res,
 cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded)
-: cuvs::neighbors::index(),
+: cuvs::neighbors::index() {}
 template <typename data_accessor, typename graph_accessor>
 index(raft::resources const& res,
 cuvs::distance::DistanceType metric,
 ...
 graph_accessor> knn_graph)
-: cuvs::neighbors::index(),
+: cuvs::neighbors::index() {}

Also applies to: 339-348

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` around lines 316 - 319, The
emitted constructor snippets for the class index (the signature starting with
index(raft::resources const& res, cuvs::distance::DistanceType metric =
cuvs::distance::DistanceType::L2Expanded) : cuvs::neighbors::index(), and the
similar block around lines 339-348) are incomplete and end with a trailing comma
in the initializer list; update the generator so these snippets produce a
syntactically complete C++ declaration by either closing the initializer list
and adding an empty body or a terminating semicolon (e.g., remove the trailing
comma and emit "{}" or ";"), ensuring the constructor snippet for index is
rendered as a full declaration instead of stopping at the comma.

1197-1197: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

additional_dataset memory location contradicts the signature.

raft::device_matrix_view<const uint8_t, ...> is documented here as “host memory”; this should say device/GPU memory.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` at line 1197, The table row
documents additional_dataset as host memory but the type
raft::device_matrix_view<const uint8_t, int64_t, raft::row_major> is a
device/GPU view; update the description for the additional_dataset parameter to
say device (GPU) memory instead of host memory and ensure wording matches other
docs (e.g., "additional dataset on device/GPU memory" or "device (GPU) memory").

1248-1248: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Search section title/signature are still malformed.

The heading is sample_filter instead of cuvs::neighbors::cagra::search, and the declaration is missing the closing );.

Suggested fix
-### sample_filter
+### cuvs::neighbors::cagra::search
 ...
 const cuvs::neighbors::filtering::base_filter& sample_filter =
 cuvs::neighbors::filtering::none_sample_filter{}
+);

Also applies to: 1253-1261

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` at line 1248, The section
header and signature for the C++ API are incorrect: replace the malformed
heading `sample_filter` with the correct function scope
`cuvs::neighbors::cagra::search` and fix the function declaration to include the
missing closing `);` so the declaration reads as the full prototype for
`cuvs::neighbors::cagra::search(...) );`. Apply the same corrections wherever
the same malformed header/signature appears (also in the block around the
occurrences corresponding to the `sample_filter` and
`cuvs::neighbors::cagra::search` entries referenced near the other affected
lines).
fern/pages/cpp_api/cpp-api-neighbors-brute-force.md (3)

317-317: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo: "Whether ot not" → "Whether or not".

This issue was previously reported and remains unfixed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 317, Fix the
typographical error in the documentation string "Whether ot not this index has
dataset norms" by changing "ot" to "or" so it reads "Whether or not this index
has dataset norms"; locate the exact sentence in the cpp-api neighbors brute
force doc (the string "Whether ot not this index has dataset norms") and update
it accordingly.

47-47: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo: "precaculated" → "precalculated".

This issue was previously reported and remains unfixed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 47, The
documentation contains a typo: replace "precaculated" with "precalculated" in
the description sentence of the brute force index in
cpp-api-neighbors-brute-force.md (the paragraph that begins "Constructs a brute
force index from a dataset..."), ensuring the sentence reads "...and take
ownership of any precalculated norms." to fix the spelling mistake.

599-599: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove stray token "output".

These standalone words appear after complete sentences and read like generation artifacts. This issue was previously reported and remains unfixed.

Also applies to: 635-635

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 599, Remove the
stray standalone token "output" occurrences left in the markdown (specifically
the orphan words at the end of sentences noted around the
cpp-api-neighbors-brute-force.md content), deleting those isolated tokens (e.g.,
the "output" tokens found near lines referenced in review) so the sentences read
cleanly; search the document for any other standalone "output" tokens and remove
them as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@fern/pages/c_api/c-api-cluster-kmeans.md`:
- Line 150: Update the parameter description for `centroids` to use the C API
naming: replace the C++ enum reference `InitMethod::Array` with the C-facing
enum/value `cuvsKMeansInitMethod` (with the `Array` variant) so the sentence
reads that when init is `cuvsKMeansInitMethod::Array` (or equivalent C API
naming `Array`) the `centroids` argument is used as initial cluster centers;
ensure you keep the rest of the description unchanged and still mention
`centroids` is inout, must be on device, and has shape [dim = n_clusters x
n_features].
- Around line 73-78: The Returns sections in
fern/pages/c_api/c-api-cluster-kmeans.md contain duplicated return payload text
(two adjacent `cuvsError_t` entries); edit the two affected Returns blocks so
each has a single `cuvsError_t` line instead of two (remove the repeated
`cuvsError_t` occurrences), ensuring both Returns sections (the earlier block
around the first Returns and the later block around the second Returns) show
only one return type/value.
- Around line 69-72: The parameter table for cuvsKMeansParamsCreate wrongly
marks the `params` argument as `in`; update its direction to `out` (or `inout`
if your project convention uses that) because cuvsKMeansParamsCreate allocates
and populates the cuvsKMeansParams_t handle via the pointer; adjust the
documentation row for `params` to reflect `out` and keep the type
`cuvsKMeansParams_t*` and the description unchanged.

In `@fern/pages/c_api/c-api-neighbors-cagra.md`:
- Line 282: The duplicate H2 "C API for CUDA ANN Graph-based nearest neighbor
search" headings cause markdownlint MD024; locate every occurrence of that exact
heading string (e.g., the headings currently rendered at the duplicated spots)
and make them unique or demote duplicates to a lower level (e.g., change some to
H3/H4 or add clarifying suffixes like "—Examples", "—API Reference", "—Usage")
so each heading text is distinct while preserving document structure and
semantics.
- Around line 517-520: The docs for cuvsCagraIndexCreate incorrectly list the
return type as cagraError_t; update the documentation entry for
cuvsCagraIndexCreate to use the correct return type cuvsError_t to match the
function signature and API conventions, ensuring any prose or examples
referencing cagraError_t are also changed to cuvsError_t so the doc aligns with
the actual API.
- Around line 298-299: The table rows for parameter descriptions were truncated
— specifically the `max_chunk_size` parameter row (shows "This is the knob to")
and the later row that ends with "aka" are incomplete; open the markdown table
in c-api-neighbors-cagra.md, find the table cell for `max_chunk_size` and the
cell that ends with "aka", and replace each truncated fragment with the full
descriptive text (complete sentences explaining what the parameter controls,
expected units/range, and examples if applicable), ensuring no stray line breaks
split a table cell so the markdown table renders properly; verify other
parameter rows in the same table are intact.

In `@fern/pages/c_api/c-api-neighbors-hnsw.md`:
- Around line 25-29: The enum documentation for cuvsHnswHierarchy is missing the
NONE value; update the table (the rows showing CPU and GPU) to include a row for
NONE using the exact numeric/string value defined in the header for
cuvsHnswHierarchy::NONE so the table matches the documented behavior and the
earlier mentions of NONE in the text.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md`:
- Around line 323-324: Update the C-API docs to use the canonical symbol names
and fix the wording: replace any occurrences of ivfFlatBuild with
cuvsIvfFlatBuild and ivfFlatIndex/ivfFlatIndex_t with cuvsIvfFlatIndex_t (e.g.,
in the table entries for the `index` parameter and any other references such as
at lines 383-384), and change the phrase "loaded disk" to "loaded from disk"
where it appears (ensure you update the `index` parameter description that
mentions the index returned by cuvsIvfFlatBuild and any similar descriptions).
- Around line 46-49: The parameter direction annotations for allocator/getter
pointer arguments are incorrect: update the table entries so pointer outputs are
marked `out` (or `inout` if the function both reads and writes) — e.g., change
the `index_params` parameter for functions using `cuvsIvfFlatIndexParams_t*` to
`out` (or `inout` where the function accepts pre-filled structs), and apply the
same fix for the other listed occurrences (lines referencing the same pattern
for the other functions/types). Locate the parameter rows in the markdown tables
(look for the `index_params` and other pointer parameters) and replace
`in`/blank with the appropriate `out`/`inout` direction to accurately reflect
which pointer args are allocator/getter outputs.
- Around line 27-33: The table descriptions were truncated for fields like
add_data_on_build, n_lists, kmeans_n_iters, kmeans_trainset_fraction,
adaptive_centers and conservative_memory_allocation (references to
ivf_flat::build) — update each cell to a complete sentence that explains the
parameter and its effect (e.g., add_data_on_build: "Whether to add the dataset
vectors to the index at build time (true adds data to index storage; false keeps
data external)"; n_lists: "Number of inverted lists (clusters) used by the IVF
index"; kmeans_n_iters: "Number of iterations to run k-means when training
cluster centers during index build"; kmeans_trainset_fraction: "Fraction of the
dataset to sample for k-means center training"; adaptive_centers: "If true,
adaptively update cluster centers during indexing; if false, centers are trained
once in ivf_flat::build"; conservative_memory_allocation: "If true, allocate
memory conservatively per cluster to minimize peak usage, otherwise allocate
extra space for performance"). Ensure each description is a full sentence with
no trailing ellipses and mentions the relevant behavior or default where
applicable.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md`:
- Line 25: Update the parameter direction for the allocation output pointers
from in to out in the docs for the create APIs: change the `index` parameter
entries for both `cuvsTieredIndexCreate` and `cuvsTieredIndexParamsCreate` from
`in` to `out` (the type remains `cuvsTieredIndex_t*` / allocation output);
ensure the table rows that currently show `| index | in | cuvsTieredIndex_t* |
... |` are corrected to `out` for both functions.
- Line 142: Update the dtype constraint to use equality comparison rather than
assignment: change the expression "kDLDataType.bits = 32" to "kDLDataType.bits
== 32" so it matches the surrounding style and the other constraint
"kDLDataType.code == kDLFloat" (look for the occurrence of kDLDataType.bits in
the c-api-neighbors-tiered-index.md snippet).
- Line 31: Remove the duplicated standalone return-type lines ("cuvsError_t")
that appear immediately after the rendered "**Returns**" sections in this
document; specifically delete the stray lines that repeat `cuvsError_t`
(occurrences noted at the current block and also at the other sections
indicated) so each function/section only shows the `**Returns**` header with the
inline `` `cuvsError_t` `` and no extra bare type line; search for the repeated
`cuvsError_t` tokens in this markdown and remove those redundant lines (leaving
the existing Returns lines intact) to eliminate the template/export duplication.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md`:
- Line 756: The generated docs contain stray fragments "in the input."
immediately following the fit_predict method signatures (e.g., the fit_predict
headings) which break prose; remove those dangling fragments after each
fit_predict signature occurrence (including the other noted occurrences) so the
signature lines stand alone and the surrounding sentences read correctly,
ensuring no other surrounding text is altered when you delete the orphaned
phrase.
- Around line 141-147: The parameter table for the kmeans overload is missing
Direction and Description entries for parameters handle, params, X,
sample_weight, centroids, inertia, and n_iter; update each of those rows to
include the same Direction and Description pattern used elsewhere in the
generated overload docs (e.g., indicate input/output direction such as "in" or
"out/in" and a concise description like "Raft resources handle", "Clustering
parameters", "Input feature matrix X", "Optional sample weights", "Output
centroids on device", "Output inertia (sum of squared distances)", and "Output
number of iterations performed"), and apply the identical fill-in approach to
the other block noted (the additional overload block with the same missing
cells).

In `@fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md`:
- Around line 52-69: The generated docs around the build(...) overloads are
broken into split sentences and empty headings—fix the template/source mapping
that emits these overload blocks so the prose is reconstructed into complete
sentences, remove any empty "Usage example:" headings (or add real examples),
and make the parameter requirements explicit (for function build, clarify
whether distances and core_distances are optional and what providing
core_distances implies about mutual reachability). Specifically update the
template that renders the build signature and parameter descriptions
(references: build, all_neighbors_params, distances, core_distances) so the
"Requires distances matrix to Default:" fragment is replaced with a clear
requirement statement and both overloads render identically correct text.
- Line 41: Finish the truncated description for the `n_clusters` table row in
cpp-api-neighbors-all-neighbors.md: explain that `n_clusters` is the number of
clusters (batches) to split the data into and that if set to 1 the algorithm
will not batch the data (i.e., creates a single cluster and processes the entire
dataset as one batch), and optionally note how larger values control
memory/parallelism tradeoffs or affect performance. Update the sentence to
complete that behavior and keep wording consistent with surrounding parameter
descriptions.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md`:
- Around line 125-126: The description for extend_params.max_chunk_size is
truncated; update the documentation for extend_params.max_chunk_size to a
complete sentence such as: "The maximum size of each chunk when dividing the
additional dataset before adding it to the graph; this setting controls how
large each chunk can be and therefore affects memory usage and indexing
granularity." Ensure this text replaces the truncated fragment in the
cpp-api-neighbors-cagra.md entry for extend_params.max_chunk_size.

---

Duplicate comments:
In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md`:
- Line 317: Fix the typographical error in the documentation string "Whether ot
not this index has dataset norms" by changing "ot" to "or" so it reads "Whether
or not this index has dataset norms"; locate the exact sentence in the cpp-api
neighbors brute force doc (the string "Whether ot not this index has dataset
norms") and update it accordingly.
- Line 47: The documentation contains a typo: replace "precaculated" with
"precalculated" in the description sentence of the brute force index in
cpp-api-neighbors-brute-force.md (the paragraph that begins "Constructs a brute
force index from a dataset..."), ensuring the sentence reads "...and take
ownership of any precalculated norms." to fix the spelling mistake.
- Line 599: Remove the stray standalone token "output" occurrences left in the
markdown (specifically the orphan words at the end of sentences noted around the
cpp-api-neighbors-brute-force.md content), deleting those isolated tokens (e.g.,
the "output" tokens found near lines referenced in review) so the sentences read
cleanly; search the document for any other standalone "output" tokens and remove
them as well.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md`:
- Around line 973-974: The "extend" usage text in the cpp-api-neighbors-cagra.md
overloads is duplicated/corrupted; replace the repeated fragments in the Usage
example for each extend overload with a single canonical paragraph that states:
the data/graph buffer will be copied from the current index into the output
index, num_rows must equal the sum of the original and additional datasets, cols
must be the dataset dimension for vector-data overloads (or the graph degree for
graph-buffer overloads), the stride must match the original index dataset
stride, the view will be stored in the output index, and the caller is
responsible for keeping the provided dataset/graph buffer alive for the lifetime
of the index (use this same paragraph for every extend overload's Usage example
to remove duplication and fix the corrupted prose).
- Around line 316-319: The emitted constructor snippets for the class index (the
signature starting with index(raft::resources const& res,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded)
: cuvs::neighbors::index(), and the similar block around lines 339-348) are
incomplete and end with a trailing comma in the initializer list; update the
generator so these snippets produce a syntactically complete C++ declaration by
either closing the initializer list and adding an empty body or a terminating
semicolon (e.g., remove the trailing comma and emit "{}" or ";"), ensuring the
constructor snippet for index is rendered as a full declaration instead of
stopping at the comma.
- Line 1197: The table row documents additional_dataset as host memory but the
type raft::device_matrix_view<const uint8_t, int64_t, raft::row_major> is a
device/GPU view; update the description for the additional_dataset parameter to
say device (GPU) memory instead of host memory and ensure wording matches other
docs (e.g., "additional dataset on device/GPU memory" or "device (GPU) memory").
- Line 1248: The section header and signature for the C++ API are incorrect:
replace the malformed heading `sample_filter` with the correct function scope
`cuvs::neighbors::cagra::search` and fix the function declaration to include the
missing closing `);` so the declaration reads as the full prototype for
`cuvs::neighbors::cagra::search(...) );`. Apply the same corrections wherever
the same malformed header/signature appears (also in the block around the
occurrences corresponding to the `sample_filter` and
`cuvs::neighbors::cagra::search` entries referenced near the other affected
lines).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5fa117ef-5b19-4ff6-8f20-e6b4b6b025d2

📥 Commits

Reviewing files that changed from the base of the PR and between 1cb8968 and c336b1f.

📒 Files selected for processing (42)
  • fern/pages/c_api/c-api-cluster-kmeans.md
  • fern/pages/c_api/c-api-core-c-api.md
  • fern/pages/c_api/c-api-neighbors-brute-force.md
  • fern/pages/c_api/c-api-neighbors-cagra.md
  • fern/pages/c_api/c-api-neighbors-hnsw.md
  • fern/pages/c_api/c-api-neighbors-ivf-flat.md
  • fern/pages/c_api/c-api-neighbors-ivf-pq.md
  • fern/pages/c_api/c-api-neighbors-nn-descent.md
  • fern/pages/c_api/c-api-neighbors-tiered-index.md
  • fern/pages/c_api/c-api-neighbors-vamana.md
  • fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
  • fern/pages/cpp_api/cpp-api-cluster-kmeans.md
  • fern/pages/cpp_api/cpp-api-cluster-spectral.md
  • fern/pages/cpp_api/cpp-api-distance-distance.md
  • fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
  • fern/pages/cpp_api/cpp-api-neighbors-ball-cover.md
  • fern/pages/cpp_api/cpp-api-neighbors-brute-force.md
  • fern/pages/cpp_api/cpp-api-neighbors-cagra.md
  • fern/pages/cpp_api/cpp-api-neighbors-dynamic-batching.md
  • fern/pages/cpp_api/cpp-api-neighbors-epsilon-neighborhood.md
  • fern/pages/cpp_api/cpp-api-neighbors-hnsw.md
  • fern/pages/cpp_api/cpp-api-neighbors-ivf-flat.md
  • fern/pages/cpp_api/cpp-api-neighbors-ivf-pq.md
  • fern/pages/cpp_api/cpp-api-neighbors-nn-descent.md
  • fern/pages/cpp_api/cpp-api-neighbors-refine.md
  • fern/pages/cpp_api/cpp-api-neighbors-scann.md
  • fern/pages/cpp_api/cpp-api-neighbors-vamana.md
  • fern/pages/cpp_api/cpp-api-preprocessing-pca.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-binary.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-pq.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-scalar.md
  • fern/pages/cpp_api/cpp-api-preprocessing-spectral-embedding.md
  • fern/pages/cpp_api/cpp-api-selection-select-k.md
  • fern/pages/python_api/python-api-neighbors-brute-force.md
  • fern/pages/python_api/python-api-neighbors-cagra.md
  • fern/pages/python_api/python-api-neighbors-filters.md
  • fern/pages/python_api/python-api-neighbors-hnsw.md
  • fern/pages/python_api/python-api-neighbors-ivf-flat.md
  • fern/pages/python_api/python-api-neighbors-ivf-pq.md
  • fern/pages/python_api/python-api-neighbors-tiered-index.md
  • fern/pages/python_api/python-api-neighbors-vamana.md
  • fern/scripts/generate_api_reference.py
✅ Files skipped from review due to trivial changes (6)
  • fern/pages/c_api/c-api-neighbors-vamana.md
  • fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
  • fern/pages/cpp_api/cpp-api-cluster-spectral.md
  • fern/pages/c_api/c-api-neighbors-brute-force.md
  • fern/pages/cpp_api/cpp-api-neighbors-ball-cover.md
  • fern/pages/c_api/c-api-neighbors-nn-descent.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • fern/pages/cpp_api/cpp-api-distance-distance.md
  • fern/pages/c_api/c-api-core-c-api.md

Comment on lines +69 to +72
| Name | Direction | Type | Description |
| --- | --- | --- | --- |
| `params` | in | `cuvsKMeansParams_t*` | cuvsKMeansParams_t to allocate |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix parameter direction for output handle in cuvsKMeansParamsCreate.

Line 71 marks params as in, but this API allocates/populates the handle via pointer, so it should be documented as out (or inout if that is your convention). Incorrect direction here can lead to misuse.

Proposed doc fix
-| `params` | in | `cuvsKMeansParams_t*` | cuvsKMeansParams_t to allocate |
+| `params` | out | `cuvsKMeansParams_t*` | Pointer to allocated `cuvsKMeansParams_t` handle. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Name | Direction | Type | Description |
| --- | --- | --- | --- |
| `params` | in | `cuvsKMeansParams_t*` | cuvsKMeansParams_t to allocate |
| Name | Direction | Type | Description |
| --- | --- | --- | --- |
| `params` | out | `cuvsKMeansParams_t*` | Pointer to allocated `cuvsKMeansParams_t` handle. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-cluster-kmeans.md` around lines 69 - 72, The parameter
table for cuvsKMeansParamsCreate wrongly marks the `params` argument as `in`;
update its direction to `out` (or `inout` if your project convention uses that)
because cuvsKMeansParamsCreate allocates and populates the cuvsKMeansParams_t
handle via the pointer; adjust the documentation row for `params` to reflect
`out` and keep the type `cuvsKMeansParams_t*` and the description unchanged.

Comment thread fern/pages/c_api/c-api-cluster-kmeans.md Outdated
Comment thread fern/pages/c_api/c-api-cluster-kmeans.md
Comment thread fern/pages/c_api/c-api-neighbors-cagra.md
Comment thread fern/pages/c_api/c-api-neighbors-cagra.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-cluster-kmeans.md
Comment thread fern/pages/cpp_api/cpp-api-cluster-kmeans.md
Comment thread fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
Comment thread fern/pages/cpp_api/cpp-api-neighbors-cagra.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (20)
fern/pages/c_api/c-api-neighbors-tiered-index.md (3)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Set create-API output pointer direction to out

Line 25 and Line 93 document allocator output pointers as in; these should be out to match allocation semantics.

Suggested minimal doc fix
-| `index` | in | `cuvsTieredIndex_t*` | cuvsTieredIndex_t to allocate |
+| `index` | out | `cuvsTieredIndex_t*` | cuvsTieredIndex_t to allocate |

-| `index_params` | in | `cuvsTieredIndexParams_t*` | cuvsTieredIndexParams_t to allocate |
+| `index_params` | out | `cuvsTieredIndexParams_t*` | cuvsTieredIndexParams_t to allocate |

Also applies to: 93-93

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md` at line 25, The documented
direction for the create API's allocator output pointer is incorrect: change the
parameter direction for the `index` parameter (type `cuvsTieredIndex_t*`) from
`in` to `out` in both occurrences (documented at the current Line 25 and Line 93
locations) so the docs reflect that the function allocates/assigns the pointer;
update the parameter table entries for `index` to use `out` and keep the rest of
the description intact.

142-143: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use comparison notation for dtype constraints (== instead of =)

Line 142 and Line 143 currently read like assignment expressions. Use == for consistency and correctness in constraint notation.

Suggested minimal doc fix
-1. `kDLDataType.code == kDLFloat` and `kDLDataType.bits = 32`
-2. `kDLDataType.code == kDLFloat` and `kDLDataType.bits = 16`
+1. `kDLDataType.code == kDLFloat` and `kDLDataType.bits == 32`
+2. `kDLDataType.code == kDLFloat` and `kDLDataType.bits == 16`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md` around lines 142 - 143, The
dtype constraint expressions currently use assignment-like notation; change
`kDLDataType.bits = 32` and `kDLDataType.bits = 16` to equality comparisons so
they read `kDLDataType.code == kDLFloat` and `kDLDataType.bits == 32` /
`kDLDataType.bits == 16` respectively to match the existing `kDLDataType.code ==
kDLFloat` style and correct the constraint syntax for `kDLDataType.code`,
`kDLDataType.bits`, and `kDLFloat`.

31-31: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicated standalone return-type lines

Each affected section already has **Returns** with `cuvsError_t`; the extra bare cuvsError_t lines are redundant and should be deleted.

Also applies to: 99-99, 121-121, 158-158, 226-226, 260-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md` at line 31, Remove the
redundant standalone "cuvsError_t" lines that duplicate the existing
"**Returns** `cuvsError_t`" entries in the C API doc sections; search for bare
lines containing only cuvsError_t (they appear immediately adjacent to each
Returns block) and delete those duplicate lines so only the "**Returns**
`cuvsError_t`" remains for each function/section.
fern/pages/cpp_api/cpp-api-neighbors-cagra.md (4)

1180-1198: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

additional_dataset memory location description conflicts with the overload type.

Line 1180 uses raft::device_matrix_view<...>, but Line 1197 says “host memory”. Please align prose with the device overload to avoid misuse.

#!/bin/bash
# Verify device overload and conflicting "host memory" description.
rg -n -C2 'raft::device_matrix_view<const uint8_t, int64_t, raft::row_major> additional_dataset|additional dataset on host memory' fern/pages/cpp_api/cpp-api-neighbors-cagra.md
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` around lines 1180 - 1198, The
documentation text incorrectly states that additional_dataset is "on host
memory" while the function signature uses raft::device_matrix_view<const
uint8_t, int64_t, raft::row_major> additional_dataset (a device view); update
the prose to indicate device memory (or provide the correct overload/signature)
so it matches the parameter type — change the description for additional_dataset
to reference device memory/GPU device_view and ensure any notes about lifetime
and stride mention the device-buffer semantics for the raft::device_matrix_view
symbol.

1248-1261: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Search section is mislabeled and the signature block is still incomplete.

The heading at Line 1248 should describe search, and the declaration in Line 1253-1261 is missing the closing );.

#!/bin/bash
# Verify mislabeled heading + missing function terminator in search section.
rg -n -C3 '^### none_sample_filter$|^void search\(|none_sample_filter\{\}$|^\s*\);$' fern/pages/cpp_api/cpp-api-neighbors-cagra.md
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` around lines 1248 - 1261, The
section heading incorrectly reads "none_sample_filter" and the search function
declaration is missing its closing ");"; update the heading to describe the
search function (e.g., "search") and close the function signature by adding the
missing closing parentheses and semicolon for the prototype that begins with
"void search(...)" and references "cuvs::neighbors::cagra::search_params",
"cuvs::neighbors::cagra::index", and the default
"cuvs::neighbors::filtering::none_sample_filter{}" so the declaration is
syntactically complete.

973-974: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

extend usage text is still corrupted and duplicated across overloads.

These blocks are fragmented/repeated and lose the buffer contract details, making overload behavior hard to understand reliably.

Also applies to: 1009-1010, 1045-1046, 1081-1082, 1117-1118, 1153-1154, 1189-1190, 1225-1226

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` around lines 973 - 974, The
"extend" usage description is duplicated and corrupted across multiple
overloads, losing the distinct buffer contract details; update each overload's
documentation for the extend option so each overload has a single, correct
paragraph describing the buffer contract (num_rows, cols/graph-degree, stride
and lifetime responsibility) and remove the repeated sentences — ensure the
descriptions for the overload that accepts the dataset buffer reference mention
"num_rows must be the sum of original and additional datasets, cols must equal
the dataset dimension, stride must match the original index dataset" and for the
graph-specific overload mention "num_rows must be the sum..., cols must be the
graph degree", and apply these corrections to all affected blocks (occurrences
around the duplicated fragments).

316-319: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Constructor snippets are still emitted as incomplete C++ declarations.

Line 319 and Line 348 end at an initializer-list comma, so both overload snippets are non-compilable as shown. This remains a generator output bug.

#!/bin/bash
# Verify constructor snippets are syntactically incomplete in this page.
rg -n -C2 'index\(raft::resources const& res,|: cuvs::neighbors::index\(\),$' fern/pages/cpp_api/cpp-api-neighbors-cagra.md

Also applies to: 347-348

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md` around lines 316 - 319, The
constructor snippets are being emitted truncated at the initializer-list comma
(e.g., the overload starting with "index(raft::resources const& res,
cuvs::distance::DistanceType metric = cuvs::distance::DistanceType::L2Expanded)"
and the following ": cuvs::neighbors::index(),"), producing non-compilable C++;
update the snippet generator to emit the full constructor declaration including
the complete initializer list and function body or the terminating semicolon for
each overload (fix the generator template that formats constructors so it
concatenates the initializer list and closing tokens instead of stopping at the
comma), and validate output for symbols like index(...) and ":
cuvs::neighbors::index()" to ensure snippets end with a closing "{}", ";", or
complete initializer list.
fern/pages/c_api/c-api-cluster-kmeans.md (3)

73-78: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicated Returns payload text.

Both return sections repeat cuvsError_t twice (Line 75 + Line 77, and Line 97 + Line 99). Keep a single return type/value to avoid noisy generated docs.

📝 Proposed doc fix
 **Returns**

 `cuvsError_t`
-
-cuvsError_t
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-cluster-kmeans.md` around lines 73 - 78, The Returns
sections in c-api-cluster-kmeans.md currently repeat the return type
`cuvsError_t` twice; remove the duplicate line so each Returns block contains a
single `cuvsError_t` entry (i.e., edit the Returns paragraphs that list
`cuvsError_t` and delete the redundant occurrence so only one `cuvsError_t`
remains).

71-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix parameter direction for output handle in cuvsKMeansParamsCreate.

Line 71 marks params as in, but this API allocates/populates the handle via pointer, so it should be documented as out. Incorrect direction here can lead to misuse.

📝 Proposed doc fix
-| `params` | in | `cuvsKMeansParams_t*` | cuvsKMeansParams_t to allocate |
+| `params` | out | `cuvsKMeansParams_t*` | Pointer to allocated `cuvsKMeansParams_t` handle. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-cluster-kmeans.md` at line 71, The parameter direction
for the output handle in the cuvsKMeansParamsCreate API is incorrect; update the
API docs for cuvsKMeansParamsCreate so the `params` parameter is marked as `out`
(not `in`) to reflect that the function allocates/populates the
cuvsKMeansParams_t* handle; locate the parameter table entry referencing
`params` and change its direction from `in` to `out` and ensure any surrounding
description mentions that the function returns/allocates the handle via this
output pointer.

150-150: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use C API naming in the centroids init description.

Line 150 references InitMethod::Array, which reads like a C++ enum path. In this C API page, prefer the C-facing enum/value wording (e.g., cuvsKMeansInitMethod with value Array) for consistency.

📝 Proposed doc fix
-| `centroids` | inout | `DLManagedTensor*` | [in] When init is InitMethod::Array, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by 'centroids'. Must be on device. [dim = n_clusters x n_features] |
+| `centroids` | inout | `DLManagedTensor*` | [in] When init is `cuvsKMeansInitMethod::Array`, use centroids as the initial cluster centers. [out] The generated centroids from the kmeans algorithm are stored at the address pointed by 'centroids'. Must be on device. [dim = n_clusters x n_features] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-cluster-kmeans.md` at line 150, The docs for the
centroids parameter use C++-style InitMethod::Array; update the wording to the C
API naming: refer to the init method as cuvsKMeansInitMethod with value Array
(e.g., "when init is cuvsKMeansInitMethod = Array") and keep the rest of the
centroids description intact so callers of the centroids parameter see the
C-facing enum name and value; you can locate this text by searching for the
`centroids` parameter and the `InitMethod::Array` mention.
fern/pages/cpp_api/cpp-api-neighbors-brute-force.md (3)

317-317: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo: "ot" → "or".

The description for has_norms contains a typo that should be corrected before publishing.

✏️ Suggested fix
-Whether ot not this index has dataset norms
+Whether or not this index has dataset norms
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 317, Fix the
typo in the description for the field/function named has_norms: change "Whether
ot not this index has dataset norms" to "Whether or not this index has dataset
norms" so the documentation reads correctly; locate the has_norms description in
the cpp-api-neighbors-brute-force.md content and update the wording accordingly.

47-47: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo: "precaculated" → "precalculated".

This typo appears in the constructor description for the brute force index and will be visible in the published API reference.

✏️ Suggested fix
-Constructs a brute force index from a dataset. This lets us precompute norms for the dataset, providing a speed benefit over doing this at query time. This index will copy the host dataset onto the device, and take ownership of any precaculated norms.
+Constructs a brute force index from a dataset. This lets us precompute norms for the dataset, providing a speed benefit over doing this at query time. This index will copy the host dataset onto the device, and take ownership of any precalculated norms.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 47, In the brute
force index constructor description (the line starting "Constructs a brute force
index from a dataset..."), fix the typo "precaculated" to "precalculated" so the
sentence reads that the index will "take ownership of any precalculated norms";
update only that word in the markdown file to preserve the rest of the
description.

592-592: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove stray "output" token.

Both serialization descriptions have an orphaned word "output" that appears after the sentence ends, which reads like a generation artifact.

✏️ Suggested fix

Remove the standalone word "output" that appears after the sentence "The serialization format can be subject to changes, therefore loading an index saved with a previous version of cuvs is not guaranteed to work."

Apply at both lines 592 and 628.

Also applies to: 628-628

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 592, Remove the
stray standalone word "output" that follows the sentence "The serialization
format can be subject to changes, therefore loading an index saved with a
previous version of cuvs is not guaranteed to work." in both serialization
description blocks (the two occurrences referenced); simply delete the orphaned
"output" token so the sentence ends cleanly.
fern/pages/c_api/c-api-neighbors-hnsw.md (2)

25-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

cuvsHnswHierarchy table is missing NONE and conflicts with the page contract.

The enum values table omits NONE, but Line 17 and later behavior notes depend on it. Add the NONE row using the exact header value so docs are self-consistent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-hnsw.md` around lines 25 - 29, The enum
values table for cuvsHnswHierarchy is missing the required NONE entry; update
the markdown table under cuvsHnswHierarchy to include a row with the exact
header name `NONE` and its corresponding value so the table matches the contract
referenced by the surrounding text and behavior notes (ensure the header string
is exactly `NONE` to preserve consistency).

166-169: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return type for cuvsHnswIndexCreate is inconsistent in the Returns block.

The signature uses cuvsError_t, but the block still lists HnswError_t.

Suggested fix
- HnswError_t
+ cuvsError_t
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-hnsw.md` around lines 166 - 169, The Returns
block for cuvsHnswIndexCreate is inconsistent with the function signature: the
signature returns cuvsError_t while the Returns section lists HnswError_t;
update the documentation so the Returns block matches the actual return type
(replace HnswError_t with cuvsError_t) or, if the intended type is HnswError_t,
change the function signature to return HnswError_t—ensure the symbol
cuvsHnswIndexCreate and the types cuvsError_t / HnswError_t are corrected
consistently.
fern/pages/c_api/c-api-neighbors-cagra.md (2)

517-520: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

cuvsCagraIndexCreate Returns block uses the wrong error type.

The documented return type is cagraError_t, but the function signature and API convention here are cuvsError_t.

Suggested fix
- cagraError_t
+ cuvsError_t
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-cagra.md` around lines 517 - 520, The
Returns block for cuvsCagraIndexCreate incorrectly documents the return type as
cagraError_t; update the documentation to use cuvsError_t to match the function
signature and API convention (search for cuvsCagraIndexCreate and replace
cagraError_t with cuvsError_t in its Returns section), and review nearby entries
to ensure no other mismatched cagraError_t references remain.

282-282: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Repeated H2 titles trigger MD024; make each section heading unique.

These duplicated ## C API for CUDA ANN Graph-based nearest neighbor search headings are still present and likely to fail markdown lint.

Also applies to: 381-381, 497-497, 666-666, 705-705

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-cagra.md` at line 282, Multiple identical H2
headings "C API for CUDA ANN Graph-based nearest neighbor search" are causing
MD024 lint failures; rename each duplicate heading to be unique (e.g., add a
distinguishing suffix like "— Usage", "— Parameters", "— Examples", or a numeric
suffix) so they remain descriptive but distinct; locate every heading occurrence
that exactly matches the string "C API for CUDA ANN Graph-based nearest neighbor
search" and update the text to a unique variant while preserving the intended
section content and hierarchy.
fern/pages/c_api/c-api-neighbors-ivf-pq.md (1)

626-632: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

cuvsIvfPqBuildPrecomputed contract text is still malformed/truncated.

This section still has broken prose and cut-off parameter descriptions (params, rotation_matrix) plus orphaned text, which obscures required tensor shapes/extents for a strict API.

Also applies to: 639-645

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-pq.md` around lines 626 - 632, The c_api
contract for cuvsIvfPqBuildPrecomputed is truncated and has malformed prose
(missing/unfinished descriptions for params, rotation_matrix, pq_centers and
index_params); update the docblock for cuvsIvfPqBuildPrecomputed to include
complete, precise parameter contracts: describe shapes/extents for each matrix
argument (e.g., matrices must have rows matching dim and per-matrix extents),
define rotation_matrix expected shape or null, fully specify params (what fields
are required and types), state how index_params.codebook_kind determines
pq_centers shape and that index_params.metric and
index_params.conservative_memory_allocation are stored in the index, and add the
consistency checks between index_params, dim, and matrix extents to the prose so
users know the exact tensor shapes and validation performed.
fern/pages/c_api/c-api-neighbors-ivf-flat.md (2)

323-324: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use canonical C-API symbol names and fix “loaded disk” wording.

ivfFlatBuild/ivfFlatIndex should reference cuvsIvfFlatBuild/cuvsIvfFlatIndex_t, and “loaded disk” should be “loaded from disk”.

Also applies to: 383-384

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md` around lines 323 - 324, Update
the C-API symbol names and wording: replace any references to ivfFlatBuild and
ivfFlatIndex with the canonical C API symbols cuvsIvfFlatBuild and
cuvsIvfFlatIndex_t (e.g., in the parameter table entries for `index`) and change
the phrase “loaded disk” to “loaded from disk”; apply the same replacements at
both affected locations (the lines around the `index`/`queries` entries and the
other occurrence noted).

46-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parameter direction annotations for output pointers are still incorrect/blank.

Create/getter APIs that write through pointer args should use out (or inout where applicable), not in/empty direction cells.

Also applies to: 110-113, 158-161, 202-204, 221-224

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md` around lines 46 - 49, The table
entries incorrectly mark output pointer parameters as "in"; update parameter
direction annotations for any create/getter APIs that write through pointer
arguments (e.g., change `index_params` of type `cuvsIvfFlatIndexParams_t*` from
`in` to `out`) and similarly fix other pointer params in this file to `out` or
`inout` as appropriate so that all functions which populate or return values via
pointer args reflect the correct direction.
🧹 Nitpick comments (1)
fern/pages/cpp_api/cpp-api-cluster-kmeans.md (1)

369-370: ⚡ Quick win

Align centroids direction with description in balanced fit overload tables.

These rows are marked inout but description only specifies [out]; please make them consistent.

Also applies to: 397-398, 425-426

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md` around lines 369 - 370, The
table rows for the parameter named `centroids` are marked as `inout` but their
descriptions and usage indicate they are output-only; update the parameter
direction from `inout` to `out` for `centroids` in the balanced `fit` overload
tables (and any other occurrences of the same `centroids` table row in this
document), and confirm that the description remains the `[out]` wording; also
scan for any other parameter rows (e.g., `inertia`) with a direction/description
mismatch and make their direction consistent with their description.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@fern/pages/c_api/c-api-neighbors-hnsw.md`:
- Line 50: The table cell contains the arithmetic expression "2 * (n_rows /
npartitions) * dim * sizeof(T)" which uses bare * characters that trigger
markdownlint MD037; update that expression by escaping each asterisk (e.g.,
replace * with \*) or wrap the entire expression in inline code formatting so
the asterisks are not interpreted as emphasis—target the table cell text
containing "2 * (n_rows / npartitions) * dim * sizeof(T)" in
c-api-neighbors-hnsw.md and apply one of these fixes.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md`:
- Line 180: The sentence about search_params is awkward and split; replace it
with a single clear sentence that explains that the search_params argument must
be the appropriate per-index-type parameter struct (e.g., use
cuvsCagraSearchParams_t for Cagra, cuvsIvfFlatSearchParams_t for IVF-flat, or
cuvsIvfPqSearchParams_t for IVF-PQ) and is passed to the underlying ANN index
search routine; update the description for the search_params parameter to be
self-contained and grammatical, mentioning it must match the tiered index type
chosen and providing those three struct names as valid options.

In `@fern/pages/c_api/c-api-preprocessing-pca.md`:
- Around line 65-70: Multiple "Returns" blocks in c-api-preprocessing-pca.md
currently list `cuvsError_t` twice; edit each affected Returns section (the
blocks around lines 65-70, 87-92, 129-134, 173-178, 209-214, 245-250) to remove
the duplicated `cuvsError_t` entry so only a single return type remains,
preserving the surrounding markdown formatting and any brief description
present; locate the duplicated entries by searching for the literal "`Returns`"
header and the repeated "`cuvsError_t`" lines and delete the redundant line in
each block.

In `@fern/pages/c_api/c-api-preprocessing-quantize-pq.md`:
- Around line 51-56: In the Returns sections of the
c-api-preprocessing-quantize-pq.md doc the return type "cuvsError_t" is
duplicated (appearing once as inline code and again as plain text); update each
Returns block (the ones that list cuvsError_t) to remove the redundant
plain-text line and leave only the inline code form `cuvsError_t` so the API
output is consistent — adjust every Returns block that currently contains both
"cuvsError_t" occurrences accordingly.
- Line 26: Update the `pq_dim` description to remove the TODO and fix the
grammar: change the sentence to something like "When zero, an optimal value is
selected using a heuristic. Note: currently `dim` must be a multiple of
`pq_dim`." Ensure this replaces the existing line for `pq_dim` so the constraint
is clear and complete (referencing the `pq_dim` parameter and the `dim` field).

In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md`:
- Line 903: The parameter doc for centroids is wrong: in the balanced
fit_predict overloads the signature requires raft::device_matrix_view<float,
int64_t> centroids (not optional). Update the description for the centroids
parameter in the relevant docs (references: centroids, fit_predict overloads,
raft::device_matrix_view<float, int64_t>) to remove "Optional" and clearly state
it is a required in/out parameter used as initial centers when InitMethod::Array
and receives the resulting centroids [dim = n_clusters x n_features]; apply the
same change to the other occurrence noted.
- Around line 962-963: The doc incorrectly states the output shape of X_new for
KMeans::transform as [n_samples x n_features]; update the documentation for the
transform result (symbol: X_new) to indicate the correct cluster-distance space
shape [n_samples x n_clusters] and adjust the type/description for the entry
using raft::device_matrix_view<float, int> (occurrences around the transform
table and the second place noted at lines 990-991) so both locations
consistently state [dim = n_samples x n_clusters].

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md`:
- Line 414: Replace the unhyphenated phrase "brute force index" with the
hyphenated "brute-force index" in the three return-value descriptions that
currently use the unhyphenated form (the occurrences of "brute force index"
within the return descriptions at the three noted spots); ensure the exact
phrase "brute-force index" is used to match the rest of the document for
consistent compound-adjective style.

---

Duplicate comments:
In `@fern/pages/c_api/c-api-cluster-kmeans.md`:
- Around line 73-78: The Returns sections in c-api-cluster-kmeans.md currently
repeat the return type `cuvsError_t` twice; remove the duplicate line so each
Returns block contains a single `cuvsError_t` entry (i.e., edit the Returns
paragraphs that list `cuvsError_t` and delete the redundant occurrence so only
one `cuvsError_t` remains).
- Line 71: The parameter direction for the output handle in the
cuvsKMeansParamsCreate API is incorrect; update the API docs for
cuvsKMeansParamsCreate so the `params` parameter is marked as `out` (not `in`)
to reflect that the function allocates/populates the cuvsKMeansParams_t* handle;
locate the parameter table entry referencing `params` and change its direction
from `in` to `out` and ensure any surrounding description mentions that the
function returns/allocates the handle via this output pointer.
- Line 150: The docs for the centroids parameter use C++-style
InitMethod::Array; update the wording to the C API naming: refer to the init
method as cuvsKMeansInitMethod with value Array (e.g., "when init is
cuvsKMeansInitMethod = Array") and keep the rest of the centroids description
intact so callers of the centroids parameter see the C-facing enum name and
value; you can locate this text by searching for the `centroids` parameter and
the `InitMethod::Array` mention.

In `@fern/pages/c_api/c-api-neighbors-cagra.md`:
- Around line 517-520: The Returns block for cuvsCagraIndexCreate incorrectly
documents the return type as cagraError_t; update the documentation to use
cuvsError_t to match the function signature and API convention (search for
cuvsCagraIndexCreate and replace cagraError_t with cuvsError_t in its Returns
section), and review nearby entries to ensure no other mismatched cagraError_t
references remain.
- Line 282: Multiple identical H2 headings "C API for CUDA ANN Graph-based
nearest neighbor search" are causing MD024 lint failures; rename each duplicate
heading to be unique (e.g., add a distinguishing suffix like "— Usage", "—
Parameters", "— Examples", or a numeric suffix) so they remain descriptive but
distinct; locate every heading occurrence that exactly matches the string "C API
for CUDA ANN Graph-based nearest neighbor search" and update the text to a
unique variant while preserving the intended section content and hierarchy.

In `@fern/pages/c_api/c-api-neighbors-hnsw.md`:
- Around line 25-29: The enum values table for cuvsHnswHierarchy is missing the
required NONE entry; update the markdown table under cuvsHnswHierarchy to
include a row with the exact header name `NONE` and its corresponding value so
the table matches the contract referenced by the surrounding text and behavior
notes (ensure the header string is exactly `NONE` to preserve consistency).
- Around line 166-169: The Returns block for cuvsHnswIndexCreate is inconsistent
with the function signature: the signature returns cuvsError_t while the Returns
section lists HnswError_t; update the documentation so the Returns block matches
the actual return type (replace HnswError_t with cuvsError_t) or, if the
intended type is HnswError_t, change the function signature to return
HnswError_t—ensure the symbol cuvsHnswIndexCreate and the types cuvsError_t /
HnswError_t are corrected consistently.

In `@fern/pages/c_api/c-api-neighbors-ivf-flat.md`:
- Around line 323-324: Update the C-API symbol names and wording: replace any
references to ivfFlatBuild and ivfFlatIndex with the canonical C API symbols
cuvsIvfFlatBuild and cuvsIvfFlatIndex_t (e.g., in the parameter table entries
for `index`) and change the phrase “loaded disk” to “loaded from disk”; apply
the same replacements at both affected locations (the lines around the
`index`/`queries` entries and the other occurrence noted).
- Around line 46-49: The table entries incorrectly mark output pointer
parameters as "in"; update parameter direction annotations for any create/getter
APIs that write through pointer arguments (e.g., change `index_params` of type
`cuvsIvfFlatIndexParams_t*` from `in` to `out`) and similarly fix other pointer
params in this file to `out` or `inout` as appropriate so that all functions
which populate or return values via pointer args reflect the correct direction.

In `@fern/pages/c_api/c-api-neighbors-ivf-pq.md`:
- Around line 626-632: The c_api contract for cuvsIvfPqBuildPrecomputed is
truncated and has malformed prose (missing/unfinished descriptions for params,
rotation_matrix, pq_centers and index_params); update the docblock for
cuvsIvfPqBuildPrecomputed to include complete, precise parameter contracts:
describe shapes/extents for each matrix argument (e.g., matrices must have rows
matching dim and per-matrix extents), define rotation_matrix expected shape or
null, fully specify params (what fields are required and types), state how
index_params.codebook_kind determines pq_centers shape and that
index_params.metric and index_params.conservative_memory_allocation are stored
in the index, and add the consistency checks between index_params, dim, and
matrix extents to the prose so users know the exact tensor shapes and validation
performed.

In `@fern/pages/c_api/c-api-neighbors-tiered-index.md`:
- Line 25: The documented direction for the create API's allocator output
pointer is incorrect: change the parameter direction for the `index` parameter
(type `cuvsTieredIndex_t*`) from `in` to `out` in both occurrences (documented
at the current Line 25 and Line 93 locations) so the docs reflect that the
function allocates/assigns the pointer; update the parameter table entries for
`index` to use `out` and keep the rest of the description intact.
- Around line 142-143: The dtype constraint expressions currently use
assignment-like notation; change `kDLDataType.bits = 32` and `kDLDataType.bits =
16` to equality comparisons so they read `kDLDataType.code == kDLFloat` and
`kDLDataType.bits == 32` / `kDLDataType.bits == 16` respectively to match the
existing `kDLDataType.code == kDLFloat` style and correct the constraint syntax
for `kDLDataType.code`, `kDLDataType.bits`, and `kDLFloat`.
- Line 31: Remove the redundant standalone "cuvsError_t" lines that duplicate
the existing "**Returns** `cuvsError_t`" entries in the C API doc sections;
search for bare lines containing only cuvsError_t (they appear immediately
adjacent to each Returns block) and delete those duplicate lines so only the
"**Returns** `cuvsError_t`" remains for each function/section.

In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md`:
- Line 317: Fix the typo in the description for the field/function named
has_norms: change "Whether ot not this index has dataset norms" to "Whether or
not this index has dataset norms" so the documentation reads correctly; locate
the has_norms description in the cpp-api-neighbors-brute-force.md content and
update the wording accordingly.
- Line 47: In the brute force index constructor description (the line starting
"Constructs a brute force index from a dataset..."), fix the typo "precaculated"
to "precalculated" so the sentence reads that the index will "take ownership of
any precalculated norms"; update only that word in the markdown file to preserve
the rest of the description.
- Line 592: Remove the stray standalone word "output" that follows the sentence
"The serialization format can be subject to changes, therefore loading an index
saved with a previous version of cuvs is not guaranteed to work." in both
serialization description blocks (the two occurrences referenced); simply delete
the orphaned "output" token so the sentence ends cleanly.

In `@fern/pages/cpp_api/cpp-api-neighbors-cagra.md`:
- Around line 1180-1198: The documentation text incorrectly states that
additional_dataset is "on host memory" while the function signature uses
raft::device_matrix_view<const uint8_t, int64_t, raft::row_major>
additional_dataset (a device view); update the prose to indicate device memory
(or provide the correct overload/signature) so it matches the parameter type —
change the description for additional_dataset to reference device memory/GPU
device_view and ensure any notes about lifetime and stride mention the
device-buffer semantics for the raft::device_matrix_view symbol.
- Around line 1248-1261: The section heading incorrectly reads
"none_sample_filter" and the search function declaration is missing its closing
");"; update the heading to describe the search function (e.g., "search") and
close the function signature by adding the missing closing parentheses and
semicolon for the prototype that begins with "void search(...)" and references
"cuvs::neighbors::cagra::search_params", "cuvs::neighbors::cagra::index", and
the default "cuvs::neighbors::filtering::none_sample_filter{}" so the
declaration is syntactically complete.
- Around line 973-974: The "extend" usage description is duplicated and
corrupted across multiple overloads, losing the distinct buffer contract
details; update each overload's documentation for the extend option so each
overload has a single, correct paragraph describing the buffer contract
(num_rows, cols/graph-degree, stride and lifetime responsibility) and remove the
repeated sentences — ensure the descriptions for the overload that accepts the
dataset buffer reference mention "num_rows must be the sum of original and
additional datasets, cols must equal the dataset dimension, stride must match
the original index dataset" and for the graph-specific overload mention
"num_rows must be the sum..., cols must be the graph degree", and apply these
corrections to all affected blocks (occurrences around the duplicated
fragments).
- Around line 316-319: The constructor snippets are being emitted truncated at
the initializer-list comma (e.g., the overload starting with
"index(raft::resources const& res, cuvs::distance::DistanceType metric =
cuvs::distance::DistanceType::L2Expanded)" and the following ":
cuvs::neighbors::index(),"), producing non-compilable C++; update the snippet
generator to emit the full constructor declaration including the complete
initializer list and function body or the terminating semicolon for each
overload (fix the generator template that formats constructors so it
concatenates the initializer list and closing tokens instead of stopping at the
comma), and validate output for symbols like index(...) and ":
cuvs::neighbors::index()" to ensure snippets end with a closing "{}", ";", or
complete initializer list.

---

Nitpick comments:
In `@fern/pages/cpp_api/cpp-api-cluster-kmeans.md`:
- Around line 369-370: The table rows for the parameter named `centroids` are
marked as `inout` but their descriptions and usage indicate they are
output-only; update the parameter direction from `inout` to `out` for
`centroids` in the balanced `fit` overload tables (and any other occurrences of
the same `centroids` table row in this document), and confirm that the
description remains the `[out]` wording; also scan for any other parameter rows
(e.g., `inertia`) with a direction/description mismatch and make their direction
consistent with their description.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 843f35d6-9fae-43e9-8783-16eed135478e

📥 Commits

Reviewing files that changed from the base of the PR and between c336b1f and e31b03e.

📒 Files selected for processing (26)
  • fern/pages/c_api/c-api-cluster-kmeans.md
  • fern/pages/c_api/c-api-neighbors-cagra.md
  • fern/pages/c_api/c-api-neighbors-hnsw.md
  • fern/pages/c_api/c-api-neighbors-ivf-flat.md
  • fern/pages/c_api/c-api-neighbors-ivf-pq.md
  • fern/pages/c_api/c-api-neighbors-tiered-index.md
  • fern/pages/c_api/c-api-neighbors-vamana.md
  • fern/pages/c_api/c-api-preprocessing-pca.md
  • fern/pages/c_api/c-api-preprocessing-quantize-binary.md
  • fern/pages/c_api/c-api-preprocessing-quantize-pq.md
  • fern/pages/c_api/c-api-preprocessing-quantize-scalar.md
  • fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
  • fern/pages/cpp_api/cpp-api-cluster-kmeans.md
  • fern/pages/cpp_api/cpp-api-cluster-spectral.md
  • fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
  • fern/pages/cpp_api/cpp-api-neighbors-brute-force.md
  • fern/pages/cpp_api/cpp-api-neighbors-cagra.md
  • fern/pages/cpp_api/cpp-api-neighbors-dynamic-batching.md
  • fern/pages/cpp_api/cpp-api-neighbors-hnsw.md
  • fern/pages/cpp_api/cpp-api-neighbors-ivf-pq.md
  • fern/pages/cpp_api/cpp-api-neighbors-nn-descent.md
  • fern/pages/cpp_api/cpp-api-neighbors-scann.md
  • fern/pages/cpp_api/cpp-api-neighbors-vamana.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-pq.md
  • fern/pages/cpp_api/cpp-api-preprocessing-quantize-scalar.md
  • fern/scripts/generate_api_reference.py
✅ Files skipped from review due to trivial changes (5)
  • fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
  • fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
  • fern/pages/c_api/c-api-preprocessing-quantize-binary.md
  • fern/pages/c_api/c-api-preprocessing-quantize-scalar.md
  • fern/pages/cpp_api/cpp-api-cluster-spectral.md

Comment thread fern/pages/c_api/c-api-neighbors-hnsw.md
Comment thread fern/pages/c_api/c-api-neighbors-tiered-index.md
Comment thread fern/pages/c_api/c-api-preprocessing-pca.md Outdated
Comment thread fern/pages/c_api/c-api-preprocessing-quantize-pq.md
Comment thread fern/pages/c_api/c-api-preprocessing-quantize-pq.md Outdated
Comment thread fern/pages/cpp_api/cpp-api-cluster-kmeans.md
Comment thread fern/pages/cpp_api/cpp-api-cluster-kmeans.md
Comment thread fern/pages/cpp_api/cpp-api-neighbors-brute-force.md Outdated
@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 9, 2026

/ok to test 08543ab

@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 10, 2026

/ok to test e4b5679

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 10, 2026

/ok to test e4b5679

@cjnolet, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 10, 2026

/ok to test 154f402

@aamijar aamijar mentioned this pull request May 11, 2026
@cjnolet
Copy link
Copy Markdown
Member Author

cjnolet commented May 11, 2026

/ok to test 0ed5bcd

@cjnolet cjnolet requested a review from a team as a code owner May 11, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant