Skip to content

feat(migrator): [7/7] Migration performance benchmarks with throughput, latency, memory measurement, and visualization#566

Open
nkanu17 wants to merge 6 commits intofeat/migrate-cli-docsfrom
feat/migrate-benchmarks
Open

feat(migrator): [7/7] Migration performance benchmarks with throughput, latency, memory measurement, and visualization#566
nkanu17 wants to merge 6 commits intofeat/migrate-cli-docsfrom
feat/migrate-benchmarks

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 1, 2026

Summary

Benchmark scripts for measuring migration performance across different index sizes and configurations. Includes real-world AG News dataset benchmarks, synthetic end-to-end timing benchmarks with memory tracking, retrieval quality comparisons (FLAT vs HNSW, FP32 vs FP16), and matplotlib visualization tooling for benchmark results.

Files

  • tests/benchmarks/ - migration_benchmark, index_migrator_real_benchmark, retrieval_benchmark, visualize_results

Stack

  1. [1/7] Migration foundation > feat(migrator): [1/7] Migration foundation with models, schema-aware planner, validation, and shared utilities #560
  2. [2/7] Sync executor with reliability and quantization > feat(migrator): [2/7] Sync executor with reliability checkpointing, crash-safe resume, and quantization support #561
  3. [3/7] Async migration > feat(migrator): [3/7] Async migration with non-blocking planner, executor, validator, and readiness utilities #562
  4. [4/7] Batch migration > feat(migrator): [4/7] Batch migration with multi-index glob selection, ordered execution, and state tracking #563
  5. [5/7] Interactive wizard > feat(migrator): [5/7] Interactive migration wizard with step-by-step field operations and algorithm-aware validation #564
  6. [6/7] CLI and documentation > feat(migrator): [6/7] CLI commands and documentation with rvl migrate subcommand, user guide, and concept docs #565
  7. [7/7] Benchmarks (this PR)

Note

Low Risk
Adds new benchmark/visualization scripts under tests/benchmarks and makes only whitespace/formatting-only tweaks in migration executors/models, so production behavior should be unchanged; main risk is added optional deps/CI noise if these scripts are accidentally executed in test runs.

Overview
Adds a new tests/benchmarks/ suite to measure migration performance and retrieval quality at scale, including an end-to-end migration timing benchmark (sync/async), a “real data” AG News HNSW/FP32→FLAT/FP16 quantization migration benchmark, and a retrieval benchmark comparing FP32 vs FP16 across HNSW vs FLAT with overlap/recall and latency/QPS metrics.

Also adds a matplotlib-based visualize_results.py helper to turn retrieval benchmark JSON output into charts, and includes minor formatting-only refactors in MigrationExecutor, AsyncMigrationExecutor, and RenameOperations.has_operations.

Written by Cursor Bugbot for commit 63d7c94. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings April 1, 2026 22:39
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 1, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds benchmark and visualization tooling under tests/benchmarks/ to measure migrator and retrieval performance (throughput/latency/memory) across multiple index configurations and dataset scales, plus scripts to render result charts.

Changes:

  • Add an end-to-end migration timing benchmark (sync + async) with index-size scaling and memory reporting.
  • Add retrieval benchmarks comparing FLAT vs HNSW and FP32 vs FP16, including overlap/recall-style metrics.
  • Add matplotlib-based result visualization for benchmark JSON outputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
tests/benchmarks/visualize_results.py New charting script to generate memory/precision/QPS/latency/recall plots from benchmark JSON results.
tests/benchmarks/retrieval_benchmark.py New retrieval benchmark runner to build/query multiple index configs and output structured metrics.
tests/benchmarks/migration_benchmark.py New migration benchmark runner to populate synthetic corpora and time migrations (sync/async), including timing breakdowns.
tests/benchmarks/index_migrator_real_benchmark.py New “real dataset” benchmark (AG News + sentence-transformers) to measure migration + query quality/memory on realistic text data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Fix SentenceTransformer local_files_only fallback for first run
- Remove unused imports (string, struct)
- Fix dead store enum_s (prefix with _)
- Guard ZeroDivisionError in chart_memory_savings
- Fix compute_recall to return 0.0 for empty ground truth
- Wrap logger handler in try/finally to prevent leak on exception
nkanu17 added a commit that referenced this pull request Apr 2, 2026
Scoped cleanup to current database only to avoid affecting
other databases on shared Redis instances.
@nkanu17 nkanu17 force-pushed the feat/migrate-cli-docs branch from 751a0e5 to 49ce997 Compare April 2, 2026 00:30
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20)
- Extract duplicated progress_callback into shared _make_progress_callback
  method in migrate.py (#565 #21)
- Add rvl migrate section to CLI reference docs (#565 #19)
- Add try/except for matplotlib import in visualize_results.py (#566 #14)
@nkanu17 nkanu17 force-pushed the feat/migrate-benchmarks branch from 99c262d to 26ce024 Compare April 2, 2026 00:30
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26ce0246e7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nkanu17 added 6 commits April 1, 2026 23:57
…hput, latency, memory measurement, and visualization

Benchmark scripts for measuring migration performance across different
index sizes and configurations. Includes real-world AG News dataset
benchmarks, synthetic end-to-end timing benchmarks with memory tracking,
retrieval quality comparisons (FLAT vs HNSW, FP32 vs FP16), and
matplotlib visualization tooling for benchmark results.
- Fix SentenceTransformer local_files_only fallback for first run
- Remove unused imports (string, struct)
- Fix dead store enum_s (prefix with _)
- Guard ZeroDivisionError in chart_memory_savings
- Fix compute_recall to return 0.0 for empty ground truth
- Wrap logger handler in try/finally to prevent leak on exception
Scoped cleanup to current database only to avoid affecting
other databases on shared Redis instances.
- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20)
- Extract duplicated progress_callback into shared _make_progress_callback
  method in migrate.py (#565 #21)
- Add rvl migrate section to CLI reference docs (#565 #19)
- Add try/except for matplotlib import in visualize_results.py (#566 #14)
- Clean up source index on failure in benchmark finally block
- Include queried_top_k in latency metrics for clarity when effective_top_k differs
@nkanu17 nkanu17 force-pushed the feat/migrate-cli-docs branch from 49ce997 to a1d54c9 Compare April 2, 2026 03:58
Copilot AI review requested due to automatic review settings April 2, 2026 03:58
@nkanu17 nkanu17 force-pushed the feat/migrate-benchmarks branch from 26ce024 to 63d7c94 Compare April 2, 2026 03:58
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63d7c94e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +152 to +158
label, title, description = row
text = f"{title}. {description}".strip()
records.append(
{
"doc_id": f"ag-news-{idx}",
"text": text,
"label": AG_NEWS_LABELS[int(label) - 1],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip CSV header rows before parsing AG News labels

--dataset-csv is documented as a CSV with label,title,description columns, but this loop parses every row as data and immediately casts label to int. On common AG News CSVs with a header row, the first row (label,title,description) raises ValueError, so the benchmark fails before loading any records. Please detect/skip headers (or use csv.DictReader) before converting labels.

Useful? React with 👍 / 👎.

Comment on lines +324 to +326
with tempfile.TemporaryDirectory() as tmpdir:
target_schema_path = Path(tmpdir) / "target_schema.yaml"
plan_path = Path(tmpdir) / "migration_plan.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Persist migration plan outside temporary directory

The benchmark report returns plan_path, but it is created under TemporaryDirectory; that directory is deleted when the function exits, so the reported path is always invalid for post-run inspection/debugging. This makes the output misleading and prevents users from reusing the generated plan artifact.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

query_embeddings = all_embeddings[size : size + query_count]
dims = int(all_embeddings.shape[1])

client.flushall()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.flushall() will delete all keys in all DBs on the target Redis instance. For a benchmark script that takes --redis-url, this is risky if someone points it at a shared or non-local Redis. Consider switching to FLUSHDB on a dedicated DB (e.g., default redis://localhost:6379/15), and/or require an explicit --force/--i-understand-this-wipes-data flag (and refuse to run unless the host is localhost unless forced).

Suggested change
client.flushall()
# Clear only the currently selected Redis database for this benchmark run
client.flushdb()

Copilot uses AI. Check for mistakes.
print(f" Size: {size:,} documents")
print(f"{'='*60}")

client.flushdb()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark unconditionally calls client.flushdb() for each size. Since --redis-url is user-provided, this can wipe unrelated data if run against a non-dedicated DB. Add a safety guard (e.g., require --flushdb/--force and default to a dedicated DB index) or at least refuse to run unless the Redis URL points to localhost unless explicitly forced.

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +225
cursor, keys = client.scan(cursor, match=f"{INDEX_PREFIX}*", count=5000)
if keys:
client.delete(*keys)
if cursor == 0:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

populate_index() deletes keys by scanning INDEX_PREFIX and calling DEL. This can delete unrelated data if INDEX_PREFIX collides with existing keys (especially if someone points --redis-url at a non-dedicated DB). Consider making the prefix configurable via CLI (defaulting to something clearly benchmark-specific and unique per run), and add a --force guard before any destructive deletes.

Copilot uses AI. Check for mistakes.
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

info = index.info()
while time.perf_counter() < deadline:
info = index.info()
pct = float(info.get("percent_indexed", 1) or 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falsy percent_indexed treated as fully indexed

Medium Severity

The expression info.get("percent_indexed", 1) or 1 defaults a falsy percent_indexed value (e.g., numeric 0 or 0.0) to 1, meaning "100% indexed." This is inverted from the production code in redisvl/migration/utils.py and async_utils.py, which correctly use or 0 to default to "not ready." If percent_indexed is ever numeric zero, the function returns immediately without waiting, causing benchmarks to run against partially-indexed data and produce inaccurate results.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63d7c94e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

query_embeddings = all_embeddings[size : size + query_count]
dims = int(all_embeddings.shape[1])

client.flushall()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace FLUSHALL with scoped cleanup in benchmark setup

benchmark_scale calls client.flushall() before each run, which removes keys from all databases on the Redis server, not just this benchmark's data. If someone points --redis-url at a shared/non-throwaway instance, this benchmark will wipe unrelated datasets; use FLUSHDB for the selected DB or delete only the benchmark key prefixes instead.

Useful? React with 👍 / 👎.

Comment on lines +324 to +326
with tempfile.TemporaryDirectory() as tmpdir:
target_schema_path = Path(tmpdir) / "target_schema.yaml"
plan_path = Path(tmpdir) / "migration_plan.yaml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist plan artifact outside temporary directory

run_quantization_migration records plan_path in the returned benchmark payload, but that path is created under TemporaryDirectory; once the context exits, the file is deleted. This makes the reported artifact path invalid for later inspection/repro and can mislead downstream analysis that expects a readable migration plan file.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +124 to +126
return parser.parse_args()


Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several metrics calculations assume query_count > 0 (e.g., statistics.mean/min/max on empty lists in run_queries/compute_overlap/compute_recall). Since --query-count currently allows 0, the script will crash with StatisticsError. Add an explicit argument validation (e.g., in parse_args() or main()) to require query_count >= 1 (and similarly validate sizes/top_k).

Suggested change
return parser.parse_args()
args = parser.parse_args()
if args.query_count < 1:
parser.error("--query-count must be >= 1")
if any(size < 1 for size in args.sizes):
parser.error("--sizes must contain positive integers")
if args.top_k < 1:
parser.error("--top-k must be >= 1")
if args.recall_k_max < 1:
parser.error("--recall-k-max must be >= 1")
return args

Copilot uses AI. Check for mistakes.
"plan_duration_seconds": round(duration, 3),
"supported": plan.diff_classification.supported,
"blocked_reasons": plan.diff_classification.blocked_reasons,
"plan_path": str(plan_path),
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_planner_allows_algorithm_change() also returns plan_path within a TemporaryDirectory that is removed on exit, so the returned plan_path is invalid. Persist the plan somewhere durable (e.g., alongside the benchmark --output file) or remove it from the return payload.

Suggested change
"plan_path": str(plan_path),

Copilot uses AI. Check for mistakes.
query_embeddings = all_embeddings[size : size + query_count]
dims = int(all_embeddings.shape[1])

client.flushall()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark uses client.flushall(), which wipes all databases on the target Redis instance (more destructive than FLUSHDB) and can surprise users if they point at a shared/non-dev server. Consider switching to flushdb(), or add an explicit opt-in flag/confirmation (and/or a dedicated DB selection) with a prominent warning in the CLI help/output.

Suggested change
client.flushall()
client.flushdb()

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants