Skip to content

feat(migrator): [4/7] Batch migration with multi-index glob selection, ordered execution, and state tracking#563

Open
nkanu17 wants to merge 6 commits intofeat/migrate-asyncfrom
feat/migrate-batch
Open

feat(migrator): [4/7] Batch migration with multi-index glob selection, ordered execution, and state tracking#563
nkanu17 wants to merge 6 commits intofeat/migrate-asyncfrom
feat/migrate-batch

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 1, 2026

Summary

Batch planner and executor for migrating multiple indexes in a single operation. Supports glob-based index selection, ordered execution with per-index state tracking, checkpoint/resume semantics, and batch-level reporting with fail-fast or continue-on-error policies.

Files

  • redisvl/migration/batch_executor.py, batch_planner.py
  • Batch unit and integration tests

Stack

  1. [1/7] Migration foundation > feat(migrator): [1/7] Migration foundation with models, schema-aware planner, validation, and shared utilities #560
  2. [2/7] Sync executor with reliability and quantization > feat(migrator): [2/7] Sync executor with reliability checkpointing, crash-safe resume, and quantization support #561
  3. [3/7] Async migration
  4. [4/7] Batch migration (this PR)
  5. [5/7] Interactive wizard
  6. [6/7] CLI and documentation
  7. [7/7] Benchmarks

Note

Medium Risk
Adds new batch migration flow that orchestrates multiple index migrations with on-disk checkpointing and resume, which can affect operational behavior and failure handling. Risk is mainly around correctness of state tracking/reporting and file I/O during long-running migrations.

Overview
Adds batch migration support to apply a single SchemaPatch across multiple indexes, including glob/file-based index selection, per-index applicability checks (missing fields, rename collisions, blocked diffs), and early validation of failure_policy.

Introduces BatchMigrationExecutor to run migrations sequentially with checkpointed state (BatchState YAML), optional resume (including retrying previously failed indexes), per-index report files, progress callbacks, and batch-level reporting with fail_fast vs continue_on_error behavior.

Exports the new batch APIs from redisvl.migration, adds comprehensive unit/integration tests for planning, apply/resume, and failure policies, and includes a small formatting-only tweak in async_executor.

Written by Cursor Bugbot for commit cecfd9d. This will update automatically on new commits. Configure here.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 1, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e05293097f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds batch migration capabilities to redisvl.migration, enabling users to plan and execute migrations across multiple RediSearch indexes using a shared schema patch, with ordered execution, checkpoint/resume, and batch-level reporting.

Changes:

  • Introduces BatchMigrationPlanner for multi-index selection (explicit list, file, glob pattern) and per-index applicability detection.
  • Introduces BatchMigrationExecutor for sequential batch execution with checkpoint state tracking, resume, failure policies, per-index reports, and batch summary reporting.
  • Adds unit + integration test coverage for planning, execution, checkpointing/resume, failure policies, and progress callbacks.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
redisvl/migration/batch_planner.py New batch planner to select indexes and generate a BatchPlan from a shared schema patch.
redisvl/migration/batch_executor.py New batch executor to apply batch plans with checkpointing, resume, and reporting.
redisvl/migration/__init__.py Exports batch planner/executor and batch models from the migration package.
tests/unit/test_batch_migration.py Unit tests for batch planning and execution behavior using mocks.
tests/integration/test_batch_migration_integration.py End-to-end batch migration integration tests against real Redis.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e05293097f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e05293097f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Fix status mismatch: executor writes 'success' to match BatchState.success_count
- Pass rename_operations to get_vector_datatype_changes
- Validate failure_policy early (reject unknown values)
- Make update_fields applicability rename-aware
- Fix progress position during resume (correct offset)
- Fix fail-fast: leave remaining in state for checkpoint resume
- Atomic checkpoint writes (write to .tmp then rename)
- Sanitize index_name in report filenames (path traversal)
- Add assert guard for fnmatch pattern type
nkanu17 added a commit that referenced this pull request Apr 2, 2026
Remove unused Path, MagicMock, and patch imports.
@nkanu17 nkanu17 force-pushed the feat/migrate-async branch from 0087dcf to 8642ec7 Compare April 2, 2026 00:30
@nkanu17 nkanu17 force-pushed the feat/migrate-batch branch from e052930 to 634cfa1 Compare April 2, 2026 00:30
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 634cfa110f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nkanu17 added 6 commits April 1, 2026 23:56
- Pass existing snapshot to create_plan_from_patch to avoid double Redis round-trip
- Use _get_client() instead of _redis_client for lazy async client initialization
- Remap datatype_changes keys to post-rename field names before quantization
- Only resume from completed checkpoint when source index is actually gone
…ion, ordered execution, and state tracking

Batch planner and executor for migrating multiple indexes in a single
operation. Supports glob-based index selection, ordered execution with
per-index state tracking, checkpoint/resume semantics, and batch-level
reporting with fail-fast or continue-on-error policies.

Includes batch unit and integration tests.
- Fix status mismatch: executor writes 'success' to match BatchState.success_count
- Pass rename_operations to get_vector_datatype_changes
- Validate failure_policy early (reject unknown values)
- Make update_fields applicability rename-aware
- Fix progress position during resume (correct offset)
- Fix fail-fast: leave remaining in state for checkpoint resume
- Atomic checkpoint writes (write to .tmp then rename)
- Sanitize index_name in report filenames (path traversal)
- Add assert guard for fnmatch pattern type
Remove unused Path, MagicMock, and patch imports.
- Add rename target collision validation in batch applicability check
- Propagate infrastructure errors (ConnectionError, TimeoutError) instead of silently marking as not applicable
@nkanu17 nkanu17 force-pushed the feat/migrate-async branch from 8642ec7 to 7a1ef9a Compare April 2, 2026 03:58
@nkanu17 nkanu17 force-pushed the feat/migrate-batch branch from 634cfa1 to cecfd9d Compare April 2, 2026 03:58
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

1 similar comment
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cecfd9d979

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +236 to +239
for field in shared_patch.changes.add_fields:
field_name = field.get("name")
if field_name and field_name in field_names:
existing_adds.append(field_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Account for renames before rejecting add_fields

The applicability check rejects any add_fields whose name exists in the current schema, but it does this before considering rename_fields. That incorrectly marks valid patches as non-applicable when a field is renamed away and then re-added under the old name (the core planner applies renames before adds in merge_patch). In this case the batch planner will silently skip indexes that should migrate, producing incomplete batch plans.

Useful? React with 👍 / 👎.

Comment on lines +227 to +228
report_file = report_dir / f"{safe_name}_report.yaml"
write_yaml(report.model_dump(exclude_none=True), str(report_file))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include unique suffix in per-index report filenames

Per-index report paths are derived only from a sanitized index name, so multiple executions for the same name (which is supported because duplicate names are preserved in planning) overwrite the same YAML file. This loses per-attempt state and makes multiple BatchIndexState entries point to one report artifact, which breaks auditability/debugging for repeated entries or any sanitized-name collision.

Useful? React with 👍 / 👎.

Comment on lines +191 to +194
return self.apply(
batch_plan,
state_path=state_path,
report_dir=report_dir,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist explicit resume plan path into checkpoint state

When resume() is given batch_plan_path, it uses that path to load the plan but never stores it back into checkpoint state and then calls apply() without batch_plan_path. If the original state had an empty plan_path, a second interruption still leaves checkpoint metadata empty and a later resume() without a CLI argument fails with "No batch plan path available," which breaks multi-interruption resume workflows.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants