Add --chunk-concurrent-size for parallel row-copy#1688
Open
dnovitski wants to merge 1 commit into
Open
Conversation
4e3c02d to
18f91e8
Compare
Port of PR github#1398 by @shaohk: allows multiple row-copy chunks to execute in parallel within each iteration using errgroup. Key changes: - Add IterationRangeValues struct for thread-safe range passing - Serialize range calculation with CalculateNextIterationRangeEndValuesLock - Rewrite iterateChunks to spawn N goroutines per queue item via errgroup - Return SQL warnings from ApplyIterationInsertQuery (eliminates race on shared MigrationLastInsertSQLWarnings field) - Increase DB connection pool when concurrency > default pool size - Add --chunk-concurrent-size CLI flag (default 1, no behavior change) Co-authored-by: shaohk <shaohk@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
18f91e8 to
de32943
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Port of #1398 by @shaohk to current master, with correctness improvements.
Adds
--chunk-concurrent-sizeflag that allows multiple row-copy chunks to execute in parallel within each iteration usingerrgroup. Default is 1 (no behavior change).Motivation
On large tables with fast storage (NVMe/SSD), the single-threaded row-copy loop can become a bottleneck. This flag enables parallel chunk copying to improve migration throughput.
Performance Results
1M rows,
ADD COLUMN extra_col INT DEFAULT 0, Docker MySQL 8.0, chunk-size=1000:Benefits scale with table size and storage throughput.
Key Design Decisions
concurrency=1matches master's retry semantics exactly (range calc inside retry loop for hook-based chunk size reduction);concurrency>1pre-calculates ranges under mutex for safe parallel executionCalculateNextIterationRangeEndValues(advanceCursor bool)protected by mutex, returns*IterationRangeValuesstruct with isolated Min/Max per goroutineMigrationLastInsertSQLWarningsfield)--chunk-concurrent-sizeexceeds default pool sizeChanges from original #1398
ApplyIterationInsertQueryreturns SQL warnings instead of writing to shared fieldIncludeMinValueshandling for first iterationcontext.Background())Testing
TestRetryBatchCopyWithHookspasses (hook-based chunk size reduction works correctly)Checklist
doc/command-line-flags.md)Based on work by @shaohk in #1398.