feat(migrator): [4/7] Batch migration with multi-index glob selection, ordered execution, and state tracking#563
feat(migrator): [4/7] Batch migration with multi-index glob selection, ordered execution, and state tracking#563nkanu17 wants to merge 6 commits intofeat/migrate-asyncfrom
Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e05293097f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Pull request overview
Adds batch migration capabilities to redisvl.migration, enabling users to plan and execute migrations across multiple RediSearch indexes using a shared schema patch, with ordered execution, checkpoint/resume, and batch-level reporting.
Changes:
- Introduces
BatchMigrationPlannerfor multi-index selection (explicit list, file, glob pattern) and per-index applicability detection. - Introduces
BatchMigrationExecutorfor sequential batch execution with checkpoint state tracking, resume, failure policies, per-index reports, and batch summary reporting. - Adds unit + integration test coverage for planning, execution, checkpointing/resume, failure policies, and progress callbacks.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
redisvl/migration/batch_planner.py |
New batch planner to select indexes and generate a BatchPlan from a shared schema patch. |
redisvl/migration/batch_executor.py |
New batch executor to apply batch plans with checkpointing, resume, and reporting. |
redisvl/migration/__init__.py |
Exports batch planner/executor and batch models from the migration package. |
tests/unit/test_batch_migration.py |
Unit tests for batch planning and execution behavior using mocks. |
tests/integration/test_batch_migration_integration.py |
End-to-end batch migration integration tests against real Redis. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e05293097f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e05293097f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Fix status mismatch: executor writes 'success' to match BatchState.success_count - Pass rename_operations to get_vector_datatype_changes - Validate failure_policy early (reject unknown values) - Make update_fields applicability rename-aware - Fix progress position during resume (correct offset) - Fix fail-fast: leave remaining in state for checkpoint resume - Atomic checkpoint writes (write to .tmp then rename) - Sanitize index_name in report filenames (path traversal) - Add assert guard for fnmatch pattern type
Remove unused Path, MagicMock, and patch imports.
0087dcf to
8642ec7
Compare
e052930 to
634cfa1
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 634cfa110f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Pass existing snapshot to create_plan_from_patch to avoid double Redis round-trip - Use _get_client() instead of _redis_client for lazy async client initialization - Remap datatype_changes keys to post-rename field names before quantization - Only resume from completed checkpoint when source index is actually gone
…ion, ordered execution, and state tracking Batch planner and executor for migrating multiple indexes in a single operation. Supports glob-based index selection, ordered execution with per-index state tracking, checkpoint/resume semantics, and batch-level reporting with fail-fast or continue-on-error policies. Includes batch unit and integration tests.
- Fix status mismatch: executor writes 'success' to match BatchState.success_count - Pass rename_operations to get_vector_datatype_changes - Validate failure_policy early (reject unknown values) - Make update_fields applicability rename-aware - Fix progress position during resume (correct offset) - Fix fail-fast: leave remaining in state for checkpoint resume - Atomic checkpoint writes (write to .tmp then rename) - Sanitize index_name in report filenames (path traversal) - Add assert guard for fnmatch pattern type
Remove unused Path, MagicMock, and patch imports.
- Add rename target collision validation in batch applicability check - Propagate infrastructure errors (ConnectionError, TimeoutError) instead of silently marking as not applicable
8642ec7 to
7a1ef9a
Compare
634cfa1 to
cecfd9d
Compare
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cecfd9d979
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for field in shared_patch.changes.add_fields: | ||
| field_name = field.get("name") | ||
| if field_name and field_name in field_names: | ||
| existing_adds.append(field_name) |
There was a problem hiding this comment.
Account for renames before rejecting add_fields
The applicability check rejects any add_fields whose name exists in the current schema, but it does this before considering rename_fields. That incorrectly marks valid patches as non-applicable when a field is renamed away and then re-added under the old name (the core planner applies renames before adds in merge_patch). In this case the batch planner will silently skip indexes that should migrate, producing incomplete batch plans.
Useful? React with 👍 / 👎.
| report_file = report_dir / f"{safe_name}_report.yaml" | ||
| write_yaml(report.model_dump(exclude_none=True), str(report_file)) |
There was a problem hiding this comment.
Include unique suffix in per-index report filenames
Per-index report paths are derived only from a sanitized index name, so multiple executions for the same name (which is supported because duplicate names are preserved in planning) overwrite the same YAML file. This loses per-attempt state and makes multiple BatchIndexState entries point to one report artifact, which breaks auditability/debugging for repeated entries or any sanitized-name collision.
Useful? React with 👍 / 👎.
| return self.apply( | ||
| batch_plan, | ||
| state_path=state_path, | ||
| report_dir=report_dir, |
There was a problem hiding this comment.
Persist explicit resume plan path into checkpoint state
When resume() is given batch_plan_path, it uses that path to load the plan but never stores it back into checkpoint state and then calls apply() without batch_plan_path. If the original state had an empty plan_path, a second interruption still leaves checkpoint metadata empty and a later resume() without a CLI argument fails with "No batch plan path available," which breaks multi-interruption resume workflows.
Useful? React with 👍 / 👎.

Summary
Batch planner and executor for migrating multiple indexes in a single operation. Supports glob-based index selection, ordered execution with per-index state tracking, checkpoint/resume semantics, and batch-level reporting with fail-fast or continue-on-error policies.
Files
redisvl/migration/batch_executor.py,batch_planner.pyStack
Note
Medium Risk
Adds new batch migration flow that orchestrates multiple index migrations with on-disk checkpointing and resume, which can affect operational behavior and failure handling. Risk is mainly around correctness of state tracking/reporting and file I/O during long-running migrations.
Overview
Adds batch migration support to apply a single
SchemaPatchacross multiple indexes, including glob/file-based index selection, per-index applicability checks (missing fields, rename collisions, blocked diffs), and early validation offailure_policy.Introduces
BatchMigrationExecutorto run migrations sequentially with checkpointed state (BatchStateYAML), optional resume (including retrying previously failed indexes), per-index report files, progress callbacks, and batch-level reporting withfail_fastvscontinue_on_errorbehavior.Exports the new batch APIs from
redisvl.migration, adds comprehensive unit/integration tests for planning, apply/resume, and failure policies, and includes a small formatting-only tweak inasync_executor.Written by Cursor Bugbot for commit cecfd9d. This will update automatically on new commits. Configure here.