feat(migrator): [1/6] add sync migration engine by nkanu17 · Pull Request #549 · redis/redis-vl-python

nkanu17 · 2026-03-31T17:32:29Z

Summary

Introduces the core synchronous index migration engine for RedisVL. This enables users to programmatically plan and execute index schema migrations using a drop-recreate strategy with automatic data reindexing.

Overview

The migration engine follows a plan -> execute -> validate workflow:

from redisvl.migration import MigrationPlanner, MigrationExecutor

# Plan: compare source index against target schema
planner = MigrationPlanner(redis_url="redis://localhost:6379")
plan = planner.plan("my_index", target_schema)

# Execute: drop-recreate with data preservation
executor = MigrationExecutor(redis_url="redis://localhost:6379")
report = executor.execute(plan)

# Validate: confirm schema and data integrity
validator = MigrationValidator(redis_url="redis://localhost:6379")
result = validator.validate(plan)

What is included

Core modules (`redisvl/migration/`)

Module	Purpose
`models.py`	Pydantic models: `MigrationPlan`, `MigrationReport`, `SchemaPatch`, `FieldRename`, `RenameOperations`
`planner.py`	Diff-based planning -- compares live index against target schema, produces a `MigrationPlan`
`executor.py`	Drop-recreate execution -- enumerates docs, drops index, recreates with new schema, reindexes
`validation.py`	Post-migration validation -- schema comparison, document count, key sampling
`utils.py`	Shared utilities: schema comparison, YAML I/O, key enumeration, index listing

Supporting changes

redisvl/cli/utils.py -- Refactored add_index_parsing_options to extract add_redis_connection_options (backward compatible, needed by migration CLI in later PR)
redisvl/redis/connection.py -- Added HNSW parameter parsing (m, ef_construction) in convert_index_info_to_schema
.gitignore -- Added migration temp files and dev directories
AGENTS.md -- Added project context file

Tests

tests/unit/test_migration_planner.py -- Comprehensive planner unit tests (~890 lines)
tests/integration/test_migration_v1.py -- End-to-end migration integration tests
tests/integration/test_field_modifier_ordering_integration.py -- Field modifier ordering tests

Design decisions

Drop-recreate strategy: Chosen for V1 simplicity. Redis does not support in-place schema ALTER, so we enumerate all documents, drop the index, recreate with the new schema, and reindex. Data keys are preserved (only the index metadata is dropped).
Schema diff planning: The planner compares the live FT.INFO output against the target IndexSchema to produce a precise diff (added/removed/updated fields, rename operations). This avoids unnecessary migrations when schemas already match.
Pydantic models throughout: All plans and reports are Pydantic BaseModel instances for validation, serialization (YAML/JSON), and type safety.
Rename support built-in: Field renames are first-class operations in the plan, executed as key-level field copy+delete during reindexing.
Validation as separate concern: MigrationValidator runs independently after execution to verify schema correctness, document counts, and key-level field sampling.

Part of a stack

This is PR 1 of 6 in the index migrator feature:

This PR -- Sync migration core
Async migration support
Batch migration support
Interactive migration wizard
CLI + documentation
Crash-safe quantization & disk space estimation (PR feat(migrator): [6/6] crash-safe quantization and disk space estimation #548)

- Remove unused imports: Union, ClusterPipeline, AsyncClusterPipeline, logging, cast, Optional, os, lazy_import, SyncRedisCluster, Mapping, Awaitable, warnings - Fix unused exception variables in index.py exception handlers - Clean up HybridResult import used only for feature detection

- Add rvl migrate subcommand (helper, list, plan, apply, validate) - Implement MigrationPlanner for schema diff classification - Implement MigrationExecutor with drop_recreate mode - Support vector quantization (float32 <-> float16) during migration - Add MigrationValidator for post-migration validation - Show error messages prominently on migration failure - Add migration temp files to .gitignore

- Add MigrationWizard for guided schema changes - Support add/update/remove field operations - Algorithm-specific datatype prompts (SVS-VAMANA vs HNSW/FLAT) - SVS-VAMANA params: GRAPH_MAX_DEGREE, COMPRESSION - HNSW params: M, EF_CONSTRUCTION - Normalize SVS_VAMANA -> SVS-VAMANA input - Preview patch as YAML before finishing

- Add conceptual guide: how migrations work (Diataxis explanation) - Add task guide: step-by-step migration walkthrough (Diataxis how-to) - Expand field-attributes.md with migration support matrix - Add vector datatypes table with algorithm compatibility - Update navigation indexes to include new guides - Normalize SVS-VAMANA naming throughout docs

- Unit tests for MigrationPlanner diff classification - Unit tests for MigrationWizard (41 tests incl. adversarial inputs) - Integration test for drop_recreate flow - Field modifier ordering integration tests (INDEXEMPTY, INDEXMISSING, etc.)

Add async/await execution for index migrations, enabling non-blocking operation for large quantization jobs and async application integration. New functionality: - CLI: --async flag for rvl migrate apply - Python API: AsyncMigrationPlanner, AsyncMigrationExecutor, AsyncMigrationValidator - Batched quantization with pipelined HSET operations - Non-blocking readiness polling with asyncio.sleep() What becomes async: - SCAN operations (yields between batches of 500 keys) - Pipelined HSET writes (100-1000 ops per batch) - Index readiness polling (asyncio.sleep vs time.sleep) What stays sync: - CLI prompts (user interaction) - YAML file I/O (local filesystem) Documentation: - Sync vs async execution guidance in concepts/index-migrations.md - Async usage examples in how_to_guides/migrate-indexes.md Tests: - 4 unit tests for AsyncMigrationPlanner - 4 unit tests for AsyncMigrationExecutor - 1 integration test for full async flow

Document Enumeration Optimization: - Use FT.AGGREGATE WITHCURSOR for efficient key enumeration - Falls back to SCAN only when index has hash_indexing_failures - Pre-enumerate keys before drop for reliable re-indexing CLI Simplification: - Remove redundant --allow-downtime flag from apply/batch-apply - Plan review is now the safety mechanism Batch Migration: - Add BatchMigrationExecutor and BatchMigrationPlanner - Support for multi-index migration with failure policies - Resumable batch operations with state persistence Bug Fixes: - Fix mypy type errors in planner, wizard, validation, and CLI Documentation: - Update concepts and how-to guides for new workflow - Remove --allow-downtime references from all docs

- Add FieldRename and RenameOperations models - Add _extract_rename_operations to detect index/prefix/field renames - Update classify_diff to support rename detection - Update tests for prefix change (now supported, not blocked)

- Add _rename_keys for prefix changes via RENAME command - Add _rename_field_in_hash and _rename_field_in_json for field renames - Execute renames before drop/recreate for safe enumeration - Support both HASH and JSON storage types

- Add rename operations (rename index, change prefix, rename field) - Add vector field removal with [WARNING] indicator - Add index_empty, ef_runtime, epsilon prompts - Add phonetic_matcher and withsuffixtrie for text/tag fields - Update menu to 8 options - All 40 supported operations now in wizard

- Add UNRELIABLE_*_ATTRS constants for attributes Redis doesn't return - Add _strip_unreliable_attrs() to normalize schemas before comparison - Update canonicalize_schema() with strip_unreliable parameter - Handle NUMERIC+SORTABLE auto-UNF normalization - Update validation.py and async_validation.py to use strip_unreliable=True - Remove withsuffixtrie from wizard (parser breaks) - All 38 comprehensive integration tests pass with strict validation

- Transform key_sample to use new prefix when validating after prefix change - Sync and async validators detect plan.rename_operations.change_prefix - Update integration test to use full validation (result['succeeded']) - All 38 comprehensive tests pass with strict validation

- Add _run_functional_checks() to sync and async validators - Wildcard search (FT.SEARCH "*") verifies index is operational - Automatically runs after every migration (no user config needed) - Verifies doc count matches expected count from source - All 40 migration integration tests pass

- Add parsing for m and ef_construction from FT.INFO in parse_vector_attrs - Normalize float weights to int in schema comparison (_strip_unreliable_attrs) - Fixes false validation failures after HNSW migrations

- Tests algorithm changes (HNSW<->FLAT) - Tests datatype changes (float32, float16, bfloat16, int8, uint8) - Tests distance metric changes (cosine, l2, ip) - Tests HNSW tuning parameters (m, ef_construction, ef_runtime, epsilon) - Tests combined changes (algorithm + datatype + metric) Requires Redis 8.0+ for INT8/UINT8 datatype tests

- Add field rename, prefix change, index rename to supported changes - Update quantization docs to include bfloat16/int8/uint8 datatypes - Add Redis 8.0+ requirement notes for INT8/UINT8 - Add Redis 8.2+ and Intel AVX-512 notes for SVS-VAMANA - Add batch migration CLI commands to CLI reference - Remove prefix/field rename from blocked changes lists

- Fix wizard algorithm case sensitivity (schema stores lowercase 'hnsw') - Remove non-existent --skip-count-check flag from docs

- CRITICAL: merge_patch now applies rename_fields to merged schema - HIGH: BatchState.success_count uses correct 'succeeded' status - HIGH: CLI helper text shows prefix/rename as supported - HIGH: Planner docstring updated for current capabilities - HIGH: batch_plan_path stored in state for resume support - MEDIUM: Fixed --output to --plan-out in batch migration docs - MEDIUM: Fixed --indexes to use comma-separated format in docs - MEDIUM: Added validation to block multi-prefix migrations - MEDIUM: Updated migration plan YAML example to match model - MEDIUM: Added skipped_count property and [SKIP] status display

- Add reliability.py: idempotent dtype detection, checkpoint persistence, BGSAVE safety net, bounded undo buffer with async rollback - Add DiskSpaceEstimate/VectorFieldEstimate models and estimate_disk_space() - Wire reliability into both sync and async executors (_quantize_vectors) - Add --resume flag to rvl migrate apply for checkpoint-based resume - Add rvl migrate estimate subcommand for pre-migration cost analysis - Update progress labels to 6 steps (enumerate, bgsave, drop, quantize, create, re-index) - Planner returns dims in datatype change metadata for idempotent detection - 39 new unit tests (90 total migration tests passing)

- Fix bfloat16/uint8 idempotent detection using dtype byte-width families so float16<->bfloat16 and int8<->uint8 are treated as equivalent - Validate checkpoint index_name matches source index before resuming - Force checkpoint_path to the load path, not the stored value - Record all batch keys in checkpoint (including skipped) to avoid re-scanning on resume - Fix misleading AOF wording when aof_enabled is not set

…uracy - Fix #2: docs_processed increments by full batch size (including skipped) so progress reaches 100% even when vectors are already quantized - Fix #4: is_already_quantized prevents skipping same-width dtype conversions (e.g. float16->bfloat16) since encodings differ - Fix #5: apply() detects checkpoint on resume and bypasses index validation, BGSAVE, field renames, drop, and key renames (all already done pre-crash); enumerates keys via SCAN with plan prefix instead - Add IM-16 (auto-detect AOF) and IM-17 (compact checkpoint) to backlog

…rom models Remove _BYTES_PER_ELEMENT from reliability.py and import the identical DTYPE_BYTES constant from models.py to maintain a single source of truth.

…py errors

- record_batch() no longer appends to processed_keys list - save() excludes processed_keys from serialized YAML - get_remaining_keys() uses completed_keys offset for compact checkpoints, with backward compat for legacy processed_keys - Add tests for compact checkpoint resume, save exclusion, checkpoint-before-HGET ordering, quantize return counts, scan pattern builder, key normalization, and AOF detection

- Use is_file() instead of exists() in batch_executor load methods - Add exclude_none to wizard preview model_dump for consistency - Fix validate timestamps: capture before/after validation runs - Use client.info('persistence') instead of full info() in BGSAVE poll - Remove misleading isdigit() comment in wizard test

nkanu17 · 2026-03-31T20:13:49Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 281c162eeb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/batch_planner.py

nkanu17 · 2026-03-31T20:41:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 281c162eeb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/planner.py

… persist - Use RENAMENX instead of RENAME for prefix migrations (sync+async) - Unwrap JSONPath list results in _rename_field_in_json (sync+async) - Remap update_fields through rename_operations in merge_patch and classify_diff - Validate/normalize prefix type (list -> string) in planner - Persist state and clear current_index when batch executor skips indexes - Heuristic rename detection continues even with explicit renames

nkanu17 · 2026-03-31T20:53:56Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 281c162eeb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/planner.py

redisvl/migration/batch_planner.py

…ype enforcement - Wizard now rebuilds working schema after each action so prompts reflect staged renames, removes, and adds - When switching to SVS-VAMANA with incompatible datatype, force selection or default to float32

nkanu17 · 2026-03-31T21:44:37Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aabcf4e048

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/async_executor.py

redisvl/migration/executor.py

… predictions - Document the 6-step drop-recreate migration sequence in user guide - Clarify that source index is dropped BEFORE quantization begins - Correct memory prediction: peak is baseline FP32 size (~57 GB for 10M), not double (80+ GB), because index is already dropped during quantization - Update 10M predictions: 64-128 GB RAM (not 128+ GB), 50-90 min timeline - Add FLAT vs HNSW target considerations for large-scale migrations

…unknown dtype comparison

nkanu17 · 2026-04-01T17:00:36Z

@codex review

nkanu17 · 2026-04-01T18:06:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6734794854

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/batch_planner.py

nkanu17 · 2026-04-01T19:43:01Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6734794854

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/async_executor.py

…ount change

…ove dev artifacts

nkanu17 · 2026-04-01T21:04:31Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7eccaabf80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/batch_planner.py

…plicate wizard field additions

nkanu17 · 2026-04-01T21:27:01Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3c0ee56d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/batch_planner.py

nkanu17 · 2026-04-01T21:45:42Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3c0ee56d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

redisvl/migration/executor.py

redisvl/migration/batch_planner.py

nkanu17 · 2026-04-01T22:17:25Z

Closing to recreate with proper stacked diffs. Code is preserved on feat/index-migrator-v0-lc-checkpoint-backup.

nkanu17 added 30 commits March 23, 2026 13:20

docs(index-migrator): add planning workspace and repo guidance

1cb752f

chore: add nitin_docs and nitin_scripts to gitignore

8d17a0f

refactor(migrate): remove unused imports

33ebf54

feat:add batch indexing

183ffc5

docs(migrations): fix typo in downtime considerations section

9fe44d9

feat(migrate): add rename operations to planner

e1add9c

- Add FieldRename and RenameOperations models - Add _extract_rename_operations to detect index/prefix/field renames - Update classify_diff to support rename detection - Update tests for prefix change (now supported, not blocked)

fix(migrate): HNSW param parsing and weight normalization for validation

ab8a017

- Add parsing for m and ef_construction from FT.INFO in parse_vector_attrs - Normalize float weights to int in schema comparison (_strip_unreliable_attrs) - Fixes false validation failures after HNSW migrations

fix(migrate): address code review feedback

4eee541

- Fix wizard algorithm case sensitivity (schema stores lowercase 'hnsw') - Remove non-existent --skip-count-check flag from docs

st

e7efc86

refactor(migrate): deduplicate dtype-bytes mapping, use DTYPE_BYTES f…

95c9c69

…rom models Remove _BYTES_PER_ELEMENT from reliability.py and import the identical DTYPE_BYTES constant from models.py to maintain a single source of truth.

fix(migrate): move storage_type assignment before try block to fix my…

79ebd5b

…py errors

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

redisvl/migration/executor.py Outdated Show resolved Hide resolved

redisvl/migration/executor.py Outdated Show resolved Hide resolved

redisvl/migration/batch_planner.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

redisvl/migration/executor.py Show resolved Hide resolved

redisvl/migration/planner.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

redisvl/migration/planner.py Show resolved Hide resolved

redisvl/migration/batch_planner.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

redisvl/migration/async_executor.py Outdated Show resolved Hide resolved

redisvl/migration/executor.py Show resolved Hide resolved

nkanu17 added 2 commits April 1, 2026 11:44

fix(migrate): update ready-state baseline on doc count change, guard …

6734794

…unknown dtype comparison

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

redisvl/migration/executor.py Outdated Show resolved Hide resolved

redisvl/migration/batch_planner.py Show resolved Hide resolved

docs(migrate): document deferred codex comments round N+2

a14ecb5

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

redisvl/migration/executor.py Show resolved Hide resolved

redisvl/migration/async_executor.py Show resolved Hide resolved

redisvl/migration/async_executor.py Outdated Show resolved Hide resolved

nkanu17 added 2 commits April 1, 2026 15:51

fix(migrate): fix async wait_for_index_ready baseline update on doc c…

ce740ff

…ount change

fix(migrate): fix HSET rename counting, re-raise pipeline errors, rem…

7eccaab

…ove dev artifacts

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

redisvl/migration/executor.py Outdated Show resolved Hide resolved

redisvl/migration/executor.py Show resolved Hide resolved

redisvl/migration/batch_planner.py Show resolved Hide resolved

fix(migrate): handle JSON.GET empty list for missing paths, reject du…

d3c0ee5

…plicate wizard field additions

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

redisvl/migration/executor.py Show resolved Hide resolved

redisvl/migration/batch_planner.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

redisvl/migration/executor.py Show resolved Hide resolved

redisvl/migration/batch_planner.py Show resolved Hide resolved

nkanu17 closed this Apr 1, 2026

Conversation

nkanu17 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Overview

What is included

Core modules (redisvl/migration/)

Supporting changes

Tests

Design decisions

Part of a stack

Uh oh!

nkanu17 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nkanu17 commented Mar 31, 2026 •

edited

Loading

Core modules (`redisvl/migration/`)