docker: add blob-level mirror fallback in dockerImageSource#845
docker: add blob-level mirror fallback in dockerImageSource#845QiWang19 wants to merge 1 commit into
Conversation
mtrmac
left a comment
There was a problem hiding this comment.
Thanks!
Doing this at the copy.Image level is very surprising, and harder than necessary — and less resilient, because it still assumes that a full image copy can proceed uninterrupted from a single mirror.
I’d expect dockerImageSource, instead, to do the fallbacks — making physicalRef and the dockerClient uses mutable (careful about HasThreadSafeGetBlob: true!), or perhaps maintaining an on-demand-extended set of dockerClients, in the worst case one per mirror; then every GetBlob/… on the dockerImageSource could fall back to another mirror (and we can have endless tuning discussions about the tuning logic there)…
Either way the generic copy code, AFAICS, does not need to know.
| // If not nil, DockerExcludedPullSources lists endpoint Location that NewImageSource skips when selecting a pull source. | ||
| DockerExcludedPullSources map[string]struct{} | ||
| // If DockerExcludedPullSources is not nil, the Location of the pull source selected by the most recent NewImageSource call. | ||
| DockerLastSelectedPullSource string |
There was a problem hiding this comment.
This is conceptually an internal state of dockerImageSource and I see no reason to make these public and store them so far from there. In the very worst case, we could use a new private interface for that (compare #531 adding NewImageSourceWithOptions) but even that seems suboptimal, see elsewhere.
| if err != nil { | ||
| return nil, fmt.Errorf("initializing source %s: %w", transports.ImageName(srcRef), err) | ||
| } | ||
| rawSource := imagesource.FromPublic(publicRawSource) |
There was a problem hiding this comment.
The idea here is that we convert from types.ImageSource to private.ImageSource at the earliest opportunity, and then we never need to worry about public-only implementations (so much).
| signersToClose []*signer.Signer // Signers that should be closed when this copier is destroyed. | ||
| } | ||
|
|
||
| type pinnedManifestSource struct { |
There was a problem hiding this comment.
This seems unnecessary in principle … dockerImageSource already has a cachedManifest (but it does require maintain that state carefully).
|
BTW the non-timeout test failures look real at a first glance. |
When a blob fetch fails with a transient error (5xx, timeout) or a blob-not-found (BLOB_UNKNOWN), try remaining configured mirrors before giving up. The fallback is transparent to all callers (CRI-O, Podman, Buildah, Skopeo). The mirror probe is serialized under mirrorMu: the first goroutine to hit a failure walks the mirror list while the rest block on getActiveSource(), then reuse the discovered working mirror. Signed-off-by: Qi Wang <qiwan@redhat.com>
|
@mtrmac I updated this PR blob-level mirror fallback. I'd appreciate an early review on the approach before I continue. And there are a few design questions I'd like your input on(in the PR description under "Questions.") |
mtrmac
left a comment
There was a problem hiding this comment.
Thanks!
I’m afraid there is way too much going on (e.g. we need to migrate of Cirrus) and I’ll be only able to pay proper attention to this, probably, weeks later.
For now just a very minimal look: Yes, I’d expect the state to be in dockerImageSource ~only. I didn’t carefully read the logic.
Are the right errors triggering fallback in
isMirrorTransientError()/isMirrorFallbackError()?
Looks plausible; I’m sure we will be turning the heuristics 10 years from now… Anyway, thanks for clearly splitting the logic into separate documented functions, that will help future maintenance.
Should any other error types be included or excluded?
I think bodyReader already handles another set of errors. [There is a question whether bodyReader should also trigger a fallback to a different mirror… I think that would better be a different PR, and it might not even be necessary — OTOH there it might be worth just thinking a bit about what the code structure to enable that would look like.]
The full-loop lock in getBlobWithMirrorFallback() serializes all blob goroutines during fallback — is this acceptable, or would you prefer a different concurrency strategy?
Without looking at the details, I think this is a great idea.
Should tryGetBlob() retry once on the same mirror before falling back, or should any failure immediately try the next source?
Hum… in general, it’s problematic to have retry loops in lower levels of code; if each level of a call stack has a retry loop, that makes retry behavior exponential in the depth of the call stack. So as a general principle, I’ve been told that retries should exist only at the top level if possible. (Which is why GetBlob/GetManifest has no retries itself; callers can use c/common/pkg/retry at a higher level. OTOH bodyReader exists because the retry behavior in that case is usefully different from retrying from scratch.)
So I think GetBlob should ideally not retry at all if no mirrors are involved. If we do have mirrors, I don’t have a strong opinion. Given how costly / disruptive / potentially unexpected / (likely to fail in many configuration?) a fallback to another mirror can be, I think one more retry before starting the mirror fallback can well make sense.
| return s.getBlob(ctx, info, cache) | ||
| } | ||
|
|
||
| func (s *dockerImageSource) getBlob(ctx context.Context, info types.BlobInfo, cache types.BlobInfoCache) (io.ReadCloser, int64, error) { |
There was a problem hiding this comment.
Is there any other caller than GetBlob itself?
| // newImageSourceAttempt calls ensureManifestIsLoaded(), which preserves | ||
| // the manifest-level filtering: only mirrors that can serve the manifest |
There was a problem hiding this comment.
Probably not?! If we got go the getBlob stage, cachedManifest is already set and ensure… does nothing.
Summary
In disconnected/restricted environments, a mirror can pass the manifest
check in
newImageSource()but then fail blob downloads (5xx, timeout,or partial mirror returning BLOB_UNKNOWN). When this happens, the copy
fails and CRI-O propagates the error to kubelet — which retries from
scratch, picks the same broken mirror, and deadlocks.
This PR adds blob-level mirror fallback inside
dockerImageSourcesothat when a blob fetch fails, remaining configured mirrors are tried
before giving up. The fallback is transparent to all callers (CRI-O,
Podman, Buildah, Skopeo).
Design choices
Fallback triggers: Only transient errors (5xx, network timeout) and
blob-not-found (
BLOB_UNKNOWN) trigger fallback. Other 4xx errors (authfailures, etc.) are fatal — they indicate a configuration problem, not a
mirror issue.
Serialized mirror probing:
getBlobWithMirrorFallback()holdsmirrorMufor the entire probe loop. This is intentional: the firstgoroutine to hit a failure walks the mirror list while the rest block on
getActiveSource(), then reuse the discovered working mirror viamirrorOverride. This avoids N goroutines independently exhausting themirror list.
No manifest re-fetch on fallback: Blob digests from the original
manifest are used against fallback sources. Safe because digests are
content-addressed.
newImageSourceAttempt()callsensureManifestIsLoaded()on each candidate, which filters out mirrorsthat can't serve the manifest at all.
remainingSourcesincludes mirrors AND the primary location: Thefallback loop tries whatever
pullSourcesentries come after theinitially-selected source, which may include the original upstream
registry.
Same-mirror retry on transient errors:
tryGetBlob()retries oncewith 1s+jitter delay before moving to the next source. This handles
brief 503 spikes without immediately triggering the full fallback
machinery.
Questions
isMirrorTransientError()/isMirrorFallbackError()? Should any other error types be included or excluded?getBlobWithMirrorFallback()serializes all blob goroutines during fallback — is this acceptable, or would you prefer a different concurrency strategy?tryGetBlob()retry once on the same mirror before falling back, or should any failure immediately try the next source?Fixes: https://issues.redhat.com/browse/OCPSTRAT-3170