feat(migrator): [7/7] Migration performance benchmarks with throughput, latency, memory measurement, and visualization by nkanu17 · Pull Request #566 · redis/redis-vl-python

nkanu17 · 2026-04-01T22:39:15Z

Summary

Benchmark scripts for measuring migration performance across different index sizes and configurations. Includes real-world AG News dataset benchmarks, synthetic end-to-end timing benchmarks with memory tracking, retrieval quality comparisons (FLAT vs HNSW, FP32 vs FP16), and matplotlib visualization tooling for benchmark results.

Files

tests/benchmarks/ - migration_benchmark, index_migrator_real_benchmark, retrieval_benchmark, visualize_results

Stack

[1/7] Migration foundation > feat(migrator): [1/7] Migration foundation with models, schema-aware planner, validation, and shared utilities #560
[2/7] Sync executor with reliability and quantization > feat(migrator): [2/7] Sync executor with reliability checkpointing, crash-safe resume, and quantization support #561
[3/7] Async migration > feat(migrator): [3/7] Async migration with non-blocking planner, executor, validator, and readiness utilities #562
[4/7] Batch migration > feat(migrator): [4/7] Batch migration with multi-index glob selection, ordered execution, and state tracking #563
[5/7] Interactive wizard > feat(migrator): [5/7] Interactive migration wizard with step-by-step field operations and algorithm-aware validation #564
[6/7] CLI and documentation > feat(migrator): [6/7] CLI commands and documentation with rvl migrate subcommand, user guide, and concept docs #565
[7/7] Benchmarks (this PR)

Note

Low Risk
Adds new benchmark/visualization scripts under tests/benchmarks and makes only whitespace/formatting-only tweaks in migration executors/models, so production behavior should be unchanged; main risk is added optional deps/CI noise if these scripts are accidentally executed in test runs.

Overview
Adds a new tests/benchmarks/ suite to measure migration performance and retrieval quality at scale, including an end-to-end migration timing benchmark (sync/async), a “real data” AG News HNSW/FP32→FLAT/FP16 quantization migration benchmark, and a retrieval benchmark comparing FP32 vs FP16 across HNSW vs FLAT with overlap/recall and latency/QPS metrics.

Also adds a matplotlib-based visualize_results.py helper to turn retrieval benchmark JSON output into charts, and includes minor formatting-only refactors in MigrationExecutor, AsyncMigrationExecutor, and RenameOperations.has_operations.

^{Written by Cursor Bugbot for commit 63d7c94. This will update automatically on new commits. Configure here.}

jit-ci · 2026-04-01T22:40:49Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/migration_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

Copilot

Pull request overview

Adds benchmark and visualization tooling under tests/benchmarks/ to measure migrator and retrieval performance (throughput/latency/memory) across multiple index configurations and dataset scales, plus scripts to render result charts.

Changes:

Add an end-to-end migration timing benchmark (sync + async) with index-size scaling and memory reporting.
Add retrieval benchmarks comparing FLAT vs HNSW and FP32 vs FP16, including overlap/recall-style metrics.
Add matplotlib-based result visualization for benchmark JSON outputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File	Description
tests/benchmarks/visualize_results.py	New charting script to generate memory/precision/QPS/latency/recall plots from benchmark JSON results.
tests/benchmarks/retrieval_benchmark.py	New retrieval benchmark runner to build/query multiple index configs and output structured metrics.
tests/benchmarks/migration_benchmark.py	New migration benchmark runner to populate synthetic corpora and time migrations (sync/async), including timing breakdowns.
tests/benchmarks/index_migrator_real_benchmark.py	New “real dataset” benchmark (AG News + sentence-transformers) to measure migration + query quality/memory on realistic text data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/benchmarks/visualize_results.py

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/migration_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

nkanu17 · 2026-04-01T22:43:57Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/benchmarks/visualize_results.py

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/migration_benchmark.py

nkanu17 · 2026-04-01T22:58:17Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/retrieval_benchmark.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/visualize_results.py

tests/benchmarks/retrieval_benchmark.py

- Fix SentenceTransformer local_files_only fallback for first run - Remove unused imports (string, struct) - Fix dead store enum_s (prefix with _) - Guard ZeroDivisionError in chart_memory_savings - Fix compute_recall to return 0.0 for empty ground truth - Wrap logger handler in try/finally to prevent leak on exception

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20) - Extract duplicated progress_callback into shared _make_progress_callback method in migrate.py (#565 #21) - Add rvl migrate section to CLI reference docs (#565 #19) - Add try/except for matplotlib import in visualize_results.py (#566 #14)

nkanu17 · 2026-04-02T00:32:25Z

@codex review

tests/benchmarks/index_migrator_real_benchmark.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26ce0246e7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/retrieval_benchmark.py

…hput, latency, memory measurement, and visualization Benchmark scripts for measuring migration performance across different index sizes and configurations. Includes real-world AG News dataset benchmarks, synthetic end-to-end timing benchmarks with memory tracking, retrieval quality comparisons (FLAT vs HNSW, FP32 vs FP16), and matplotlib visualization tooling for benchmark results.

- Fix SentenceTransformer local_files_only fallback for first run - Remove unused imports (string, struct) - Fix dead store enum_s (prefix with _) - Guard ZeroDivisionError in chart_memory_savings - Fix compute_recall to return 0.0 for empty ground truth - Wrap logger handler in try/finally to prevent leak on exception

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20) - Extract duplicated progress_callback into shared _make_progress_callback method in migrate.py (#565 #21) - Add rvl migrate section to CLI reference docs (#565 #19) - Add try/except for matplotlib import in visualize_results.py (#566 #14)

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

nkanu17 · 2026-04-02T03:59:43Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63d7c94e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T04:02:57Z

tests/benchmarks/index_migrator_real_benchmark.py

+            label, title, description = row
+            text = f"{title}. {description}".strip()
+            records.append(
+                {
+                    "doc_id": f"ag-news-{idx}",
+                    "text": text,
+                    "label": AG_NEWS_LABELS[int(label) - 1],


Skip CSV header rows before parsing AG News labels

--dataset-csv is documented as a CSV with label,title,description columns, but this loop parses every row as data and immediately casts label to int. On common AG News CSVs with a header row, the first row (label,title,description) raises ValueError, so the benchmark fails before loading any records. Please detect/skip headers (or use csv.DictReader) before converting labels.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T04:02:57Z

tests/benchmarks/index_migrator_real_benchmark.py

+    with tempfile.TemporaryDirectory() as tmpdir:
+        target_schema_path = Path(tmpdir) / "target_schema.yaml"
+        plan_path = Path(tmpdir) / "migration_plan.yaml"


Persist migration plan outside temporary directory

The benchmark report returns plan_path, but it is created under TemporaryDirectory; that directory is deleted when the function exits, so the reported path is always invalid for post-run inspection/debugging. This makes the output misleading and prevents users from reusing the generated plan artifact.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T04:03:57Z

tests/benchmarks/index_migrator_real_benchmark.py

+    query_embeddings = all_embeddings[size : size + query_count]
+    dims = int(all_embeddings.shape[1])
+
+    client.flushall()


client.flushall() will delete all keys in all DBs on the target Redis instance. For a benchmark script that takes --redis-url, this is risky if someone points it at a shared or non-local Redis. Consider switching to FLUSHDB on a dedicated DB (e.g., default redis://localhost:6379/15), and/or require an explicit --force/--i-understand-this-wipes-data flag (and refuse to run unless the host is localhost unless forced).

Suggested change

client.flushall()

# Clear only the currently selected Redis database for this benchmark run

client.flushdb()

Copilot · 2026-04-02T04:03:58Z

tests/benchmarks/retrieval_benchmark.py

+        print(f"  Size: {size:,} documents")
+        print(f"{'='*60}")
+
+        client.flushdb()


This benchmark unconditionally calls client.flushdb() for each size. Since --redis-url is user-provided, this can wipe unrelated data if run against a non-dedicated DB. Add a safety guard (e.g., require --flushdb/--force and default to a dedicated DB index) or at least refuse to run unless the Redis URL points to localhost unless explicitly forced.

Copilot · 2026-04-02T04:03:58Z

tests/benchmarks/migration_benchmark.py

+        cursor, keys = client.scan(cursor, match=f"{INDEX_PREFIX}*", count=5000)
+        if keys:
+            client.delete(*keys)
+        if cursor == 0:


populate_index() deletes keys by scanning INDEX_PREFIX and calling DEL. This can delete unrelated data if INDEX_PREFIX collides with existing keys (especially if someone points --redis-url at a non-dedicated DB). Consider making the prefix configurable via CLI (defaulting to something clearly benchmark-specific and unique per run), and add a --force guard before any destructive deletes.

nkanu17 · 2026-04-02T04:04:06Z

@codex review

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-04-02T04:04:51Z

tests/benchmarks/retrieval_benchmark.py

+    info = index.info()
+    while time.perf_counter() < deadline:
+        info = index.info()
+        pct = float(info.get("percent_indexed", 1) or 1)


Falsy percent_indexed treated as fully indexed

Medium Severity

The expression info.get("percent_indexed", 1) or 1 defaults a falsy percent_indexed value (e.g., numeric 0 or 0.0) to 1, meaning "100% indexed." This is inverted from the production code in redisvl/migration/utils.py and async_utils.py, which correctly use or 0 to default to "not ready." If percent_indexed is ever numeric zero, the function returns immediately without waiting, causing benchmarks to run against partially-indexed data and produce inaccurate results.

Additional Locations (1)

tests/benchmarks/index_migrator_real_benchmark.py#L216-L217

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63d7c94e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T04:07:30Z

tests/benchmarks/index_migrator_real_benchmark.py

+    query_embeddings = all_embeddings[size : size + query_count]
+    dims = int(all_embeddings.shape[1])
+
+    client.flushall()


Replace FLUSHALL with scoped cleanup in benchmark setup

benchmark_scale calls client.flushall() before each run, which removes keys from all databases on the Redis server, not just this benchmark's data. If someone points --redis-url at a shared/non-throwaway instance, this benchmark will wipe unrelated datasets; use FLUSHDB for the selected DB or delete only the benchmark key prefixes instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T04:07:30Z

tests/benchmarks/index_migrator_real_benchmark.py

+    with tempfile.TemporaryDirectory() as tmpdir:
+        target_schema_path = Path(tmpdir) / "target_schema.yaml"
+        plan_path = Path(tmpdir) / "migration_plan.yaml"


Persist plan artifact outside temporary directory

run_quantization_migration records plan_path in the returned benchmark payload, but that path is created under TemporaryDirectory; once the context exits, the file is deleted. This makes the reported artifact path invalid for later inspection/repro and can mislead downstream analysis that expects a readable migration plan file.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T04:28:36Z

tests/benchmarks/retrieval_benchmark.py

+    return parser.parse_args()
+
+


Several metrics calculations assume query_count > 0 (e.g., statistics.mean/min/max on empty lists in run_queries/compute_overlap/compute_recall). Since --query-count currently allows 0, the script will crash with StatisticsError. Add an explicit argument validation (e.g., in parse_args() or main()) to require query_count >= 1 (and similarly validate sizes/top_k).

Suggested change

return parser.parse_args()

args = parser.parse_args()

if args.query_count < 1:

parser.error("--query-count must be >= 1")

if any(size < 1 for size in args.sizes):

parser.error("--sizes must contain positive integers")

if args.top_k < 1:

parser.error("--top-k must be >= 1")

if args.recall_k_max < 1:

parser.error("--recall-k-max must be >= 1")

return args

Copilot · 2026-04-02T04:28:36Z

tests/benchmarks/index_migrator_real_benchmark.py

+            "plan_duration_seconds": round(duration, 3),
+            "supported": plan.diff_classification.supported,
+            "blocked_reasons": plan.diff_classification.blocked_reasons,
+            "plan_path": str(plan_path),


assert_planner_allows_algorithm_change() also returns plan_path within a TemporaryDirectory that is removed on exit, so the returned plan_path is invalid. Persist the plan somewhere durable (e.g., alongside the benchmark --output file) or remove it from the return payload.

Suggested change

"plan_path": str(plan_path),

Copilot · 2026-04-02T04:28:36Z

tests/benchmarks/index_migrator_real_benchmark.py

+    query_embeddings = all_embeddings[size : size + query_count]
+    dims = int(all_embeddings.shape[1])
+
+    client.flushall()


This benchmark uses client.flushall(), which wipes all databases on the target Redis instance (more destructive than FLUSHDB) and can surprise users if they point at a shared/non-dev server. Consider switching to flushdb(), or add an explicit opt-in flag/confirmation (and/or a dedicated DB selection) with a prominent warning in the CLI help/output.

Suggested change

client.flushall()

client.flushdb()

Copilot AI review requested due to automatic review settings April 1, 2026 22:39

Copilot started reviewing on behalf of nkanu17 April 1, 2026 22:39 View session

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

tests/benchmarks/index_migrator_real_benchmark.py Show resolved Hide resolved

cursor bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/migration_benchmark.py Outdated Show resolved Hide resolved

tests/benchmarks/index_migrator_real_benchmark.py Show resolved Hide resolved

Copilot AI reviewed Apr 1, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

tests/benchmarks/index_migrator_real_benchmark.py Show resolved Hide resolved

nkanu17 requested a review from Copilot April 1, 2026 22:49

Copilot started reviewing on behalf of nkanu17 April 1, 2026 22:50 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nkanu17 requested a review from Copilot April 1, 2026 22:56

Copilot started reviewing on behalf of nkanu17 April 1, 2026 22:57 View session

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/index_migrator_real_benchmark.py Outdated Show resolved Hide resolved

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nkanu17 added a commit that referenced this pull request Apr 2, 2026

fix: use flushdb instead of flushall in retrieval benchmark (#566)

23511cb

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

nkanu17 force-pushed the feat/migrate-cli-docs branch from 751a0e5 to 49ce997 Compare April 2, 2026 00:30

nkanu17 force-pushed the feat/migrate-benchmarks branch from 99c262d to 26ce024 Compare April 2, 2026 00:30

cursor bot reviewed Apr 2, 2026

View reviewed changes

tests/benchmarks/index_migrator_real_benchmark.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

nkanu17 added 6 commits April 1, 2026 23:57

fix: use flushdb instead of flushall in retrieval benchmark (#566)

ae07a21

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

fix: address review round 3 for migrate-benchmarks (#566)

5e686ce

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

style: auto-format after rebase cascade

63d7c94

nkanu17 force-pushed the feat/migrate-cli-docs branch from 49ce997 to a1d54c9 Compare April 2, 2026 03:58

Copilot AI review requested due to automatic review settings April 2, 2026 03:58

nkanu17 force-pushed the feat/migrate-benchmarks branch from 26ce024 to 63d7c94 Compare April 2, 2026 03:58

Copilot started reviewing on behalf of nkanu17 April 2, 2026 03:59 View session

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

Copilot AI reviewed Apr 2, 2026

View reviewed changes

cursor bot reviewed Apr 2, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

nkanu17 requested a review from Copilot April 2, 2026 04:24

Copilot started reviewing on behalf of nkanu17 April 2, 2026 04:24 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

	client.flushall()
	# Clear only the currently selected Redis database for this benchmark run
	client.flushdb()

-    return parser.parse_args()
+    args = parser.parse_args()
+    if args.query_count < 1:
+        parser.error("--query-count must be >= 1")
+    if any(size < 1 for size in args.sizes):
+        parser.error("--sizes must contain positive integers")
+   if args.top_k < 1:
+        parser.error("--top-k must be >= 1")
+    if args.recall_k_max < 1:
+        parser.error("--recall-k-max must be >= 1")
+    return args

Conversation

nkanu17 commented Apr 1, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Stack

Uh oh!

jit-ci bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 2, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nkanu17 commented Apr 2, 2026

Uh oh!

nkanu17 commented Apr 1, 2026 •

edited by cursor bot

Loading

jit-ci bot commented Apr 1, 2026 •

edited

Loading

Falsy `percent_indexed` treated as fully indexed