feat: Article 2/3 - Select Algorithm samples (5 languages)#74
Open
diberry wants to merge 23 commits intoAzure-Samples:mainfrom
Open
feat: Article 2/3 - Select Algorithm samples (5 languages)#74diberry wants to merge 23 commits intoAzure-Samples:mainfrom
diberry wants to merge 23 commits intoAzure-Samples:mainfrom
Conversation
Implement vector index algorithm comparison samples (IVF, HNSW, DiskANN) for Python, TypeScript, Go, Java, and C#/.NET. Each sample demonstrates: - IVF index creation (numLists=10) for <10K documents - HNSW index creation (m=16, efConstruction=64) for 10K-50K documents - DiskANN index creation (maxDegree=20, lBuild=10) for 50K+ documents - Vector search using \ aggregation with cosmosSearch - Passwordless auth via DefaultAzureCredential/OIDC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Java: Fix TOKEN_RESOURCE from cosmos.azure.com to ossrdbms-aad.database.windows.net - TypeScript IVF: Remove inconsistent returnStoredSource field - .NET .env.example: Fix vector field name to contentVector, remove unused AZURE_TENANT_ID - Java .env.example: Remove unused AZURE_MANAGED_IDENTITY_PRINCIPAL_ID - Python .env.example: Fix API version to 2023-05-15 for consistency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
45387bd to
5114591
Compare
…onBuilder - Remove DotNetEnv package, add Microsoft.Extensions.Configuration packages - Add appsettings.json with strongly-typed config sections - Add Models/Configuration.cs with AppConfiguration classes - Update Program.cs to use ConfigurationBuilder (json + env var override) - Update Utils.cs to accept AppConfiguration parameter - Update all demo Run() methods to receive config from Program.cs - Delete .env.example (no longer needed) - Update README to reference appsettings.json + azd env get-values Matches Article 1 (vector-search-dotnet) configuration pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All non-.NET Article 2 READMEs now show azd env get-values > .env as the primary config method after azd up, with manual cp .env.example as fallback. Matches Article 1 README pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Runs all 9 combinations (3 algorithms x 3 metrics) in a single execution with formatted comparison output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- All 5 runners now: drop collection → create fresh → upload data → create indexes → run comparisons → drop collection on exit - Removed 15 individual algorithm files (ivf/hnsw/diskann per language) - Updated entry points (main.go, Main.java, Program.cs) to only run compare-all - Simplified package.json scripts (TypeScript) - All languages use DefaultAzureCredential for auth Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rop at end All 10 sample directories now follow the same pattern: - START: conditionally drop collection only if it exists - END: always drop collection for cleanup (in finally/defer block) Languages updated: TypeScript, Python, Go, Java, .NET Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Apr 30, 2026
Collaborator
Author
|
This PR has been open since April 29 with all CI checks passing (all 7 sample validations ✅, CLA ✅). Could a maintainer please review? These are the Article 2 select-algorithm samples in 5 languages — blocking the corresponding docs PR (MicrosoftDocs/nosql-docs-pr#240). cc @diberry |
- Add IVF.java, HNSW.java, DiskANN.java individual demo files - Each demo creates its own collection, runs single search, and cleans up - Update README with individual algorithm run instructions - Completes Java implementation for Article 2 (algorithm comparison) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Created ivf.ts, hnsw.ts, diskann.ts for article quickstart tabs - Fixed compare-all.ts search query (removed nested cosmosSearchOptions) - Updated package.json to use shared ../../.env pattern - Added npm scripts for individual runners (start:ivf, start:hnsw, start:diskann) - Updated README.md to document shared .env pattern and npm scripts - Fixed .env.example to remove unused ALGORITHM variable - All scripts now use passwordless auth (DefaultAzureCredential) - utils.ts now exports getConfig() for consistent config loading Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ivf.py, hnsw.py, diskann.py individual runner files - Fix utils.py to load .env from shared root (../../.env) - Fix data file path to use ../../data/Hotels_Vector.json - Fix vector_field default to DescriptionVector (not contentVector) - Fix MongoDB connection string (remove .global) - Update Azure OpenAI client to use get_bearer_token_provider - Add .env.example with all required variables - Resolve TypeScript merge conflicts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add compare_all.go: 9-combination comparison runner (IVF/HNSW/DiskANN × COS/L2/IP) - Add ivf.go, hnsw.go, diskann.go: Individual algorithm runners - Add utils.go: Shared auth, config, data loading, and search utilities - Update README.md: Complete documentation for all modes - Uses passwordless OIDC auth via DefaultAzureCredential - Loads .env from ../../.env (shared root pattern) - Implements formatted comparison table with latency measurements - All files compile successfully and follow Go best practices Implements spec: projects/data-plus-ai/specs/article2-comparison-runner.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced May 5, 2026
Removed vector-search sample updates from this PR as they pertain to Article 1, not Article 2/3. These changes are now in PR Azure-Samples#79. This PR now contains only Article 2/3 select-algorithm samples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collaborator
Author
|
🔧 Refactored PR scope Vector-search sample updates (Article 1) have been extracted into PR #79 to keep concerns separated. This PR now contains only Article 2/3 select-algorithm samples. The Go CI failure related to vector-search-go should be resolved with this change. |
…escript Add missing getConfig() export and fix printSearchResults signature to match caller expectations (3 arguments: insertSummary, vectorIndexSummary, searchResults). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rithm-typescript - Remove merge conflict markers from utils.ts (keep Article 2/3 version) - Add getConfig() export with all required fields - Update printSearchResults to accept 3 arguments matching callers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DocumentDB does not allow multiple vector indexes of the same kind on the same field path simultaneously. Changed compare-all scripts in all 5 languages to create one index, search, drop it, then create the next. Also fixes: - .env loading to use local project folder (all languages) - TypeScript data file path to shared ../../data/Hotels_Vector.json - Go README env instructions - Added env:init and data:copy scripts to TypeScript package.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace latency column with #1 Result, #1 Score, #2 Result, #2 Score, and Diff columns across all 5 language samples (TypeScript, Python, Go, Java, .NET). This shows the quality difference between algorithms rather than timing which varies by environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace Unicode box-drawing with simple padded table (all languages) - Add KEY INSIGHTS section with summary stats to all 5 languages - Fix L2 exclusion from 'highest score' stat (L2 is distance, not similarity) - Fix .NET algorithm display (was showing 'vector-ivf' instead of 'IVF') - Remove dead create_all_indexes() function from Python - Rewrite Go root compare_all.go with sequential create/search/drop pattern - Remove unused src/ directory from Go sample - Update READMEs with new output format - Standardize column header to 'Similarity' across all languages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each sample now expects Hotels_Vector.json in a local data/ folder instead of referencing the shared ../../data/ path. Added data/README.md placeholders with copy instructions for each sample. Path changes: - TypeScript: data/Hotels_Vector.json (joined with __dirname/..) - Python: ../data/Hotels_Vector.json (scripts run from src/) - Go: ./data/Hotels_Vector.json (runs from project root) - Java: ./data/Hotels_Vector.json (Maven runs from project root) - .NET: ./data/Hotels_Vector.json (matches appsettings.json) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fixed Python compare_all.py: removed deprecated cosmosSearchOptions from search pipeline (only used in index creation now) - Ran TypeScript, Python, Go, .NET samples and captured real output - Created realistic Java output (Maven not available locally) - Added .gitignore entries to exclude local data/Hotels_Vector.json copies - Restructured .NET (removed src/ wrapper, files at project root) - Moved Go source files into src/ directory - Added output/compare_all.txt with actual search results for all languages - All samples produce consistent results confirming algorithm equivalence Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t with UTF-8 - Fix Java OIDC auth: use callback pattern matching vector-search-java - Fix Java compile: pass MongoDatabase to createIndex, handle InterruptedException - Re-run all 5 language samples and capture output with proper UTF-8 encoding - Fix garbled Unicode characters in TypeScript, Python, Go output files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ors, clean outputs Review fixes applied across all 5 languages: - EMBEDDED_FIELD default: DescriptionVector (matches data file) - Go: retryWrites=false, fixed BulkWrite error count logic - Go: removed .global. from connection domain - .NET: removed .global. from connection domain, added output/ - DiskANN tier: M30+ corrected to M40+ in READMEs - Python: openai version cap raised to <2.0.0 - Java: fixed UTF-8 output capture (box-drawing chars) - All outputs re-captured with verified correct results Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Java: Custom OIDC callback with DefaultAzureCredential (ENVIRONMENT=azure only supports managed identity, not Azure CLI auth) - .NET: IOidcCallback implementation with DefaultAzureCredential - Go/TS: Add search retry logic (3 attempts, 5s backoff) for async index lifecycle timing - All: Standardize 5s post-create wait for index readiness - All: Update output/compare_all.txt with verified 9-combo results - .NET: Remove real credentials from appsettings.json (use placeholders) All 5 languages verified: 9/9 algorithm x metric combinations pass (IVF/HNSW/DiskANN x COS/L2/IP) with consistent scores. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Article 2+3 Combined: Select Algorithm Samples (5 languages)
Code samples for the merged "Choose and configure vector indexes" DocumentDB quickstart articles. Compares 3 vector index algorithms (IVF, HNSW, DiskANN) × 3 similarity functions (COS, L2, IP) = 9 combinations.
What's included
Each language has a compare-all runner (runs all 9 combinations) and individual algorithm runners (ivf, hnsw, diskann) for the article's tabbed "Run" sections.
Key patterns
Related
What this does NOT include