feat binlog apply optimization#1378
Conversation
1. Support for ignoring binlog events that exceed chunk boundary values. 2. Support for binlog merge processing.
When there is only one column in a unique index, merging of DML binlog events is permitted.
…tionRangeMaxValues is nil Add handling for the case when MigrationIterationRangeMaxValues is nil
Correctness Issues Found + Fix PRI reviewed this PR in detail and found several correctness bugs in the merge DML implementation. I've submitted a fix as PR #1687 (built on top of this branch). Critical Issues
Fix (PR #1687)
Performance ResultsThe fix maintains the performance benefits:
See #1687 for full details and benchmarks. |
Port the merge-DML batching optimization from PR github#1378 to current master, adapting it to the refactored builder-pattern API. When --is-merge-dml-event is enabled and the unique key is memory-comparable (all numeric columns): - Deduplicates DML events by unique key (latest event wins) - Cancels INSERT+DELETE pairs for same key (net no-op) - Batches remaining INSERTs/UPDATEs as multi-row REPLACE INTO - Batches remaining DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (already copied by row-copy) Uses BuildColumnsPreparedValues for proper per-column conversion tokens (convert_tz, ELT, etc.) preventing data corruption for timezone, enum, and JSON columns. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port the merge-DML batching optimization from PR github#1378 to current master, adapting it to the refactored builder-pattern API. When --is-merge-dml-event is enabled and the unique key is memory-comparable (all numeric columns): - Deduplicates DML events by unique key (latest event wins) - Cancels INSERT+DELETE pairs for same key (net no-op) - Batches remaining INSERTs/UPDATEs as multi-row REPLACE INTO - Batches remaining DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (already copied by row-copy) Uses BuildColumnsPreparedValues for proper per-column conversion tokens (convert_tz, ELT, etc.) preventing data corruption for timezone, enum, and JSON columns. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port the merge-DML batching optimization from PR github#1378 to current master, adapting it to the refactored builder-pattern API. When --is-merge-dml-event is enabled and the unique key is memory-comparable (all numeric columns): - Deduplicates DML events by unique key (latest event wins) - Cancels INSERT+DELETE pairs for same key (net no-op) - Batches remaining INSERTs/UPDATEs as multi-row REPLACE INTO - Batches remaining DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (already copied by row-copy) Uses BuildColumnsPreparedValues for proper per-column conversion tokens (convert_tz, ELT, etc.) preventing data corruption for timezone, enum, and JSON columns. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Port the merge-DML batching optimization from PR github#1378 to current master, adapting it to the refactored builder-pattern API. When --is-merge-dml-event is enabled and the unique key is memory-comparable (all numeric columns): - Deduplicates DML events by unique key (latest event wins) - Cancels INSERT+DELETE pairs for same key (net no-op) - Batches remaining INSERTs/UPDATEs as multi-row REPLACE INTO - Batches remaining DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (already copied by row-copy) Uses BuildColumnsPreparedValues for proper per-column conversion tokens (convert_tz, ELT, etc.) preventing data corruption for timezone, enum, and JSON columns. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohoukun <shaohoukun@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohk <shaohoukun@meituan.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --is-merge-dml-event flag that batches and deduplicates binlog DML events before applying them to the ghost table, significantly reducing SQL round-trips during high-write migrations. When enabled and the unique key is memory-comparable (numeric columns): - Deduplicates DML events by unique key (latest event wins) - Reduces INSERT+DELETE sequences to DELETE (safe against row-copy races) - Batches INSERTs/UPDATEs as multi-row REPLACE INTO - Batches DELETEs as DELETE WHERE (pk) IN (...) - Skips events beyond migration range (not yet copied by row-copy) - Disables merge for tables with secondary unique indexes Safety: strict numeric type validation in formatNumericValue prevents SQL injection. Type detection uses exact base-type parsing (not substring). Uses BuildColumnsPreparedValues for proper per-column conversion tokens. Original implementation by shaohoukun in PR github#1378, adapted to current master's builder-pattern API with correctness and security hardening. Co-authored-by: shaohk <shaohoukun@meituan.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Support for ignoring binlog events that exceed chunk boundary values.
If the value corresponding to the unique key of the binlog exceeds the maximum value of chunk iteration and is less than the maximum boundary value of the copy, it is ignored.
Support for binlog merge processing.
When the columns of the unique key selected by chunk are int or float, the binlog is processed by map merging, and all delete operations are merged into one
delete sql.All insert and update operations are merged into onereplace sql. Then execute these sql in db transaction.script/cibuildreturns with no formatting errors, build errors or unit test errors.