tso: prevent dual writers when upgrading from PD service mode#10473
tso: prevent dual writers when upgrading from PD service mode#10473YuhaoZhang00 wants to merge 4 commits intotikv:masterfrom
Conversation
In PD service mode (isKeyspaceGroupEnabled=false), SaveTimestamp now atomically checks for an existing TSO microservice primary before writing the timestamp window. If a TSO primary is detected, the write is rejected, causing the allocator to reset and resign PD leadership. This allows a PD in API service mode to take over and proxy TSO requests, preventing dual writers during rolling upgrades. Also remove TestMixedTSODeployment which tested seamless coexistence of PD service mode and TSO microservice — a scenario not supported by the official upgrade procedure (PD restart is required). Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
|
Hi @YuhaoZhang00. Thanks for your PR. I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant Server
participant Allocator
participant TimestampOracle
participant StorageEndpoint
participant Database
Server->>Allocator: NewAllocator(..., checkTSOPrimary)
Allocator->>TimestampOracle: initialize(checkTSOPrimary)
Note over TimestampOracle: prepare timestamp
TimestampOracle->>StorageEndpoint: SaveTimestamp(ts, checkTSOPrimary)
alt checkTSOPrimary == true
StorageEndpoint->>Database: TX read TSO primary election key
Database-->>StorageEndpoint: key value
alt key non-empty
StorageEndpoint-->>TimestampOracle: return error (reject write)
else key empty
StorageEndpoint->>Database: TX save timestamp
Database-->>StorageEndpoint: success
StorageEndpoint-->>TimestampOracle: success
end
else checkTSOPrimary == false
StorageEndpoint->>Database: TX save timestamp
Database-->>StorageEndpoint: success
StorageEndpoint-->>TimestampOracle: success
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/cc @JmPotato PTAL |
|
@YuhaoZhang00: GitHub didn't allow me to request PR reviews from the following users: PTAL. Note that only tikv members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/storage/endpoint/tso.go`:
- Around line 113-115: The error string returned in the conditional checking
tsoPrimary (the return in the if tsoPrimary != "" branch that calls
errors.Errorf) uses capitalized "PD" and a trailing punctuation; change the
message to be all lowercase with no trailing punctuation (e.g., use "pd
microservice primary exists, pd in pd service mode must yield" or a concise
lowercase variant) so it follows the repository's error-string convention.
In `@server/server.go`:
- Around line 511-514: The current approach lets the PD-service-mode etcd leader
re-acquire PD leadership after allocator yields because allocatorUpdater calls
Reset(true) (which only resigns PD leadership) while leaderLoop will re-campaign
from the current etcd leader; update the yield path so that when s.tsoAllocator
(created in s.tsoAllocator) detects an existing TSO primary and allocatorUpdater
calls Reset(true), you also either (a) prevent leaderLoop from re-campaigning
for that member by setting/clearing a "yielded" flag on the server (e.g., on s)
that leaderLoop checks before starting a new campaign, or (b) perform an etcd
leadership transfer of s.member to another peer before calling Reset(true) so
the yielding node cannot immediately win; implement the chosen approach in the
allocatorUpdater/Reset invocation and ensure leaderLoop observes the yielded
flag (or verify transfer completion) to stop immediate re-campaigns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dcc0a32a-3ca6-4e86-b375-6cc94420454b
📒 Files selected for processing (8)
pkg/storage/endpoint/tso.gopkg/storage/storage_tso_test.gopkg/tso/allocator.gopkg/tso/keyspace_group_manager.gopkg/tso/tso.gopkg/utils/keypath/absolute_key_path.goserver/server.gotests/integrations/tso/client_test.go
💤 Files with no reviewable changes (1)
- tests/integrations/tso/client_test.go
| if tsoPrimary != "" { | ||
| return errors.Errorf("tso microservice primary exists, PD in PD service mode must yield") | ||
| } |
There was a problem hiding this comment.
Keep the new error string lowercase.
Line 114 introduces capitalized PD, which breaks the repo's error-string convention.
As per coding guidelines, "Error strings must be lowercase with no trailing punctuation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/storage/endpoint/tso.go` around lines 113 - 115, The error string
returned in the conditional checking tsoPrimary (the return in the if tsoPrimary
!= "" branch that calls errors.Errorf) uses capitalized "PD" and a trailing
punctuation; change the message to be all lowercase with no trailing punctuation
(e.g., use "pd microservice primary exists, pd in pd service mode must yield" or
a concise lowercase variant) so it follows the repository's error-string
convention.
| } | ||
|
|
||
| // TSODefaultPrimaryPath returns the primary election path for the default keyspace group's TSO service. | ||
| func TSODefaultPrimaryPath() string { |
There was a problem hiding this comment.
Maybe resue ElectionPath rather than adding a new one?
| if err != nil { | ||
| return err | ||
| } | ||
| if tsoPrimary != "" { |
There was a problem hiding this comment.
Prefer using len cause it also checks the nil case.
| if tsoPrimary != "" { | |
| if len(tsoPrimary) == 0 { |
Reuse ElectionPath instead of adding a redundant TSODefaultPrimaryPath function, and use len() check for string emptiness per repo convention. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Add TestDualWriterPrevention to verify that a PD leader in PD service mode yields its TSO allocator when a TSO microservice primary is detected, replacing the removed TestMixedTSODeployment which asserted the old (unsafe) dual-writer coexistence behavior. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
|
/ok-to-test |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10473 +/- ##
==========================================
+ Coverage 78.86% 78.94% +0.08%
==========================================
Files 529 532 +3
Lines 71102 71871 +769
==========================================
+ Hits 56072 56740 +668
- Misses 11014 11106 +92
- Partials 4016 4025 +9
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| groupID uint32, | ||
| ts time.Time, | ||
| leadership *election.Leadership, | ||
| checkTSOPrimary bool, |
There was a problem hiding this comment.
I noticed that, to pass this variable in, we've gone through many layers and modified several structures to carry additional fields.
Is there a more elegant or direct way for the storageEndpoint function to detect this variable, rather than passing it all the way through the call stack?
There was a problem hiding this comment.
Good point about the layering. I explored alternatives but think the current approach is the best trade-off here.
- The check must live inside
SaveTimestamp's etcd transaction for atomicity guarantee. Plus,SaveTimestampalready does the same kind of thing: leadership checking - The decision to enable it can only come from
server.go. That's the only place that knows whether we're in PD service mode (!isKeyspaceGroupEnabled) - Wiring it differently doesn't really help. For example, storing the flag on
StorageEndpointavoids threading it through the call stack, butStorageEndpointis a generic base shared by all storage backends, and putting a TSO-specific config there is a different kind of layering issue.
One alternative I considered is to a callback hook on StorageEndpoint: Add a preSaveTimestampCheck func(txn kv.Txn) error on StorageEndpoint. server.go installs a closure that checks for the TSO primary; SaveTimestamp calls it inside the transaction if non-nil.
- Pro: endpoint/tso.go has zero knowledge of TSO primaries.
- Con: Adds indirection. Also requires either a type assertion or changing NewStorageWithEtcdBackend's return type to expose the setter, which breaks encapsulation in a different way.
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
| // writing to avoid dual writers during a rolling upgrade. | ||
| if checkTSOPrimary { | ||
| tsoPrimaryPath := keypath.ElectionPath(&keypath.MsParam{ServiceName: constant.TSOServiceName}) | ||
| tsoPrimary, err := txn.Load(tsoPrimaryPath) |
There was a problem hiding this comment.
What is the performance impact of this operation?
There was a problem hiding this comment.
For 1 single SaveTimestamp() method call, this query adds 1 extra single-key read in the same etcd transaction that already does the leadership check and timestamp-window persistence. So the incremental cost here is just one more key lookup before commit.
SaveTimestamp() only runs roughly once per TSOSaveInterval (5s) in steady state. This is not on the GetTS hot path.
There was a problem hiding this comment.
Only the first save time needs to check the TSO primary. Can the latter ignore this if the first save is successful?
|
@liyishuai: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/hold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bufferflies, JmPotato, liyishuai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
@YuhaoZhang00: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: ref #10329
When upgrading from PD service mode (
isKeyspaceGroupEnabled=false) to API service mode with a TSO microservice, there is a window where both the PD leader and the TSO microservice could write to the same TSO timestamp key, causing dual writers and potential timestamp regression.What is changed and how does it work?
In PD service mode,
SaveTimestampnow atomically checks for an existing TSO microservice primary before writing the timestamp window. If a TSO primary is detected, the write is rejected, causing the allocator to reset and resign PD leadership. This allows a PD in API service mode to take over and proxy TSO requests, preventing dual writers during rolling upgrades.Also remove
TestMixedTSODeploymentwhich tested seamless coexistence of PD service mode and TSO microservice — a scenario not supported by the official upgrade procedure (PD restart is required).Key changes:
checkTSOPrimaryparameter toSaveTimestampthat atomically loads the TSO primary election key (/ms/{cluster_id}/tso/00000/primary) within the same etcd transaction before writing the timestamp window. When a TSO primary is detected, the write is rejected.checkTSOPrimaryis only true in PD service mode (!isKeyspaceGroupEnabled). In API service mode, PD uses the existingcheckTSOServicestop/start mechanism instead.TestMixedTSODeployment- seamless coexistence of PD service mode and TSO microservice is not a supported upgrade path.Check List
Tests
Code changes
None.
Side effects
None.
Related changes
None.
Release note
Summary by CodeRabbit
Bug Fixes
Tests
Other