Skip to content

add config examples to use PQ and SQ indexing and search for wiki-1M with Cohere embeddings#1047

Open
harsha-simhadri wants to merge 5 commits intomainfrom
harshasi/add_wiki1M_examples
Open

add config examples to use PQ and SQ indexing and search for wiki-1M with Cohere embeddings#1047
harsha-simhadri wants to merge 5 commits intomainfrom
harshasi/add_wiki1M_examples

Conversation

@harsha-simhadri
Copy link
Copy Markdown
Contributor

add config examples to use PQ and SQ indexing and search for wiki-1M with Cohere embeddings

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new benchmark configuration examples intended to demonstrate product quantization (PQ) and spherical quantization workflows for the wikipedia-1M Cohere embedding dataset.

Changes:

  • Added a PQ graph-index build+search example JSON for wikipedia-1M.
  • Added an exhaustive spherical-quantization example JSON (currently named as a wiki1M graph-index example).

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File Description
diskann-benchmark/example/graph-index-spherical-quantization-wiki1M.json Adds an exhaustive spherical quantization benchmark config (but currently references siftsmall test data / exhaustive tag).
diskann-benchmark/example/graph-index-product-quantization-wiki1M.json Adds a PQ graph-index benchmark config for wikipedia-1M (currently uses an unregistered job type).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-benchmark/example/graph-index-product-quantization-wiki1M.json Outdated
harsha-simhadri and others added 4 commits May 10, 2026 15:48
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The PR accidentally renamed spherical-exhaustive.json to
graph-index-spherical-quantization-wiki1M.json, breaking the
spherical_quantization_intergration test which references the old name.

- Restore spherical-exhaustive.json with its original content
- Update graph-index-spherical-quantization-wiki1M.json with proper content

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.47%. Comparing base (3d3ed4c) to head (40a16e2).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1047      +/-   ##
==========================================
- Coverage   90.60%   89.47%   -1.13%     
==========================================
  Files         461      461              
  Lines       85494    85494              
==========================================
- Hits        77462    76498     -964     
- Misses       8032     8996     +964     
Flag Coverage Δ
miri 89.47% <ø> (-1.13%) ⬇️
unittests 89.32% <ø> (-1.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -0,0 +1,55 @@
{
"search_directories": [
"../big-ann-benchmarks/data/wikipedia_cohere" ],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: In the spirit of self‑service, consider including guidance on how to download the dataset, so that even an AI agent can follow the steps and retrieve the required data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants