Improve performance for tables with composite primary keys#1686
Improve performance for tables with composite primary keys#1686coding-chimp wants to merge 4 commits into
Conversation
| col2StartOp := string(minRangeComparisonSign) | ||
| fromClause := fmt.Sprintf("%s.%s force index (%s)", databaseName, originalTableName, uniqueKey) | ||
|
|
||
| if sameFirstColumnValue(rangeStartArgs, rangeEndArgs) { |
There was a problem hiding this comment.
sameFirstColumnValue reports whether the first column's range start and end bounds are equal. When true, the range fits within a single col1 partition, and a simpler query must replace the 3-part UNION.
It's not just a performance shortcut, it's a correctness requirement. Look at what the UNION produces when col1_start == col1_end:
- Part 1:
col1 = x AND col2 > start_col2- no upper bound oncol2, so it includes all rows, including rows aboveend_col2 - Part 2:
col1 > x AND col1 < x- returns nothing - Part 3:
col1 = x AND col2 <= end_col2- no lower bound, so it includes rows belowstart_col2
The UNION ALL result is essentially every row where col1 = x. The outer ORDER BY ... LIMIT 1 OFFSET chunkSize-1 then picks the Nth row, which is the wrong boundary.
The simplified query fixes this:
WHERE col1 = ? AND col2 > ? AND col2 <= ?
Both bounds are explicit, and because col1 is pinned with equality, MySQL can use the composite index directly as a seek on (col1, col2).
| return result, explodedArgs, nil | ||
| } | ||
|
|
||
| func buildUniqueKeyRangeEndTwoColumnViaTemptable( |
There was a problem hiding this comment.
The UNION optimization is arguably unnecessary for the ViaTemptable code path, since it'll only be called once at the end of the migration, when the query is unlikely to be slow. However, it provides consistency with ViaOffset and doesn't add much code.
A Pull Request should be associated with an Issue.
Related issue: #1669
Description
This PR introduces an optimized query pattern for tables with composite primary keys. As described in the linked issue, gh-ost's queries against these tables can currently be very slow. We've seen a single query take up to 16 minutes, resulting in slow migrations that might not even finish when gh-ost falls behind in binlog streaming. The new query pattern has a constant query time (~30ms in our testing) regardless of the table's data shape. We're only introducing it for two-column composite keys because the complexity would grow exponentially with more columns.
We're introducing the new query pattern in
BuildRangeInsertQuery,BuildUniqueKeyRangeEndPreparedQueryViaOffset,BuildUniqueKeyRangeEndPreparedQueryViaTemptable.I've tested the changes against real data on a test DB, and the migration took ~12h to complete. In comparison, I'm currently running the same migration again without these changes, and it has only reached 15.6% after 22h. The current ETA is 87h.
script/cibuildreturns with no formatting errors, build errors or unit test errors.