backup: teach download job to report progress when nodes are unavailable#170866
Draft
andrew-r-thomas wants to merge 1 commit into
Draft
backup: teach download job to report progress when nodes are unavailable#170866andrew-r-thomas wants to merge 1 commit into
andrew-r-thomas wants to merge 1 commit into
Conversation
This patch teaches the online restore download job to correctly report progress in the face of temporary node failures. Previously, the `DownloadSpans` rpc returning per-node errors would cause the job's main context group to be canceled, causing a retry to trigger before the job could report a progress update based on the `SpanStats` rpc. This patch adjust the goroutines in these context groups to save per-node errors rather than returning them, and triggers a retry once the context group completes in that case, allowing the job to continue updating progress while some nodes error during an rpc fanout. Epic: None Release note: None
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch teaches the online restore download job to correctly report progress in the face of temporary node failures. Previously, the
DownloadSpanrpc returning per-node errors would cause the job's core context group to be canceled, causing a retry to trigger before the job could report a progress update based on theSpanStatsrpc. This patch adjusts the goroutines in these context groups to save per-node errors rather than returning them, and triggers a retry once the context group completes in that case, allowing the job to continue updating progress while some nodes error during an rpc fanout.Epic: None
Release note: None