-
Notifications
You must be signed in to change notification settings - Fork 60
[sha512] image/copy: allow users to force digest algorithm in CLI (e.g. skopeo copy --force-digest)
#552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6597 |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is valuable.
On the process side, I’m not quite sure how to go about building this — I’d like to avoid exposing and API that plain doesn’t work for some transports. I think I’d prefer:
- Building this ~one complete feature at a time, e.g. if we add
PutBlobOptions.Digests, then actually implement it in all transports immediately. We should never be failing to enforceForceDigest…and reporting success. (In many cases, aSituation{CannotChangeDigestReason: "a sha256-only transport not updated yet", digest: sha256Value}in a transport would be fine, that’s what the tracker is for, as long as the enforcement of the rules is all there immediately.) - Not providing the public API until all pieces are in place. If we need something for testing, a temporary
BrokenSetForceDigestAlgorithm /* Unstable API, may be changed or removed any time */?
Without a deep review or revisiting earlier PRs, I think missing pieces here are
- multi-platform images: should not be too difficult, but is fairly orthogonal
TryReusingBlob: this +BlobInfoCachewill be significant, it has ~the same prerequisite question “what layer reuse is valid cross-digest” as c/storage — and that’s a major reason why I’d prefer to build this a feature at a time, so that we can get plainPutBlobor configs done without blocking on this- The
PutBlobPartialcode path (plausibly should be just a failure becausepull --force-digest=makes little sense, and would be difficult to implement, I’m not sure) - DiffID computation (maybe keep sha256 and fail on schema1??)
ZstdChunkedcompression using sha512 digests for all its data
image/copy/blob.go
Outdated
| } | ||
| if stream.info.Digest != "" && uploadedInfo.Digest != stream.info.Digest { | ||
| return types.BlobInfo{}, fmt.Errorf("Internal error writing blob %s, blob with digest %s saved with digest %s", srcInfo.Digest, stream.info.Digest, uploadedInfo.Digest) | ||
| expectedAlgo, err := ic.c.options.digestOptions.Choose(digests.Situation{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant / conflicting with PutBlob… making its own choice. If we had the generic code choosing a specific algorithm and expecting precisely that one to be used, then it should do that before PutBlob and provide that to transports as a parameter.
I’m not sure where the decision should be, at this point [and it’s a private interface, so we can change our mind] — I guess the transport might have some information (e.g. about remote registry capabilities) that the generic code doesn’t — but, for now, we don’t have all transports updated yet, either way…
AFAICS this only exists as a sanity check (both config and layer copies later enforce equality if cannotModifyManifest…), so I think an “if algorithms match the whole values must match” check here would be sufficient.
image/copy/copy.go
Outdated
| } | ||
|
|
||
| // SetForceDigestAlgorithm forces the use of a specific digest algorithm for this copy operation. | ||
| func (o *Options) SetForceDigestAlgorithm(algo digest.Algorithm) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Yes, I’m generally in favor of adding more options via methods rather than fields. And we can always add something vaguely like SetForceDigestOptions later if we decided to make that fully public.)
I’m wondering about naming this ForceDestinationDigest… — we might eventually want a “pulls must consume sha512 or stronger digests and refuse to work on sha256”, and that’s a fairly unrelated feature.
| if err != nil { | ||
| return fmt.Errorf("failed to set force-digest algorithm: %w", err) | ||
| } | ||
| o.digestOptions = digestOpts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fail if this is already set? It might protect against oversights. OTOH it can also prevent some code patterns — if that turned out to be troublesome, we can always remove that enforcement later.
image/copy/copy.go
Outdated
| // only to allow gradually building the feature set. | ||
| // After c/image/copy consistently implements it, provide a public digest options API of some kind. | ||
| optionsCopy := *options | ||
| // Only set default if not already configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We can avoid the whole optionsCopy if digestOptions != nil.)
image/copy/single.go
Outdated
| if destInfo.Digest != srcInfo.Digest { | ||
| return fmt.Errorf("Internal error: copying uncompressed config blob %s changed digest to %s", srcInfo.Digest, destInfo.Digest) | ||
| // Allow digest algorithm changes when forcing a specific digest algorithm | ||
| forcingDifferentAlgo := ic.c.options.digestOptions.MustUseSet() != "" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- See the similar check
copyBlobFromStream - “MustUse” Is not the only reason a copy might choose a different algorithm (e.g. we might be pushing to a sha256-only-transport or registry), hard-coding a check only for
MustUsehere is overly specific - Compare the end of
copyLayers, where the equality check is immediately paired with manifest edits [and see elsewhere about manifest edits], so that we always set up edits if we know they are needed, and we minimize the risk of the two getting out of sync. I’m not sure whether the check+update should live incopyConfigorcopyUpdatedConfigAndManifest(guessing the latter), but they should be together.
|
|
||
| // updateManifestConfigDigest uses typed manifest structures instead of generic JSON manipulation. | ||
| // This leverages the existing manifest parsing and serialization infrastructure. | ||
| func (ic *imageCopier) updateManifestConfigDigest(manifestBlob []byte, src types.Image, newConfigDigest digest.Digest) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to go through the manifest abstraction in c/image/internal/image and not be hard-coded in c/image/copy.
(It can be a method on image.Sourced/genericManifest, like CanChangeLayerCompression, without making it a public API, at least for now.)
image/copy/single.go
Outdated
| } | ||
|
|
||
| ic.c.Printf("Writing manifest to image destination\n") | ||
| manifestDigest, err := manifest.Digest(man) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is another part that will need updating, for multi-arch images.)
|
FWIW containers/buildah#6597 hits |
I think #512 has a fix, but I’m not going to stay around today to wait for containers/buildah#6552 to confirm. |
I can go ahead with this. Let me know if you'd much rather the other preference though. |
|
I didn’t mean the two bullet points as alternatives, but as two parts of the same plan. My main concern here is having some way of tracking what doesn’t work yet; all of that will need to either be implemented or at least be updated to cleanly fail. So I was thinking “one feature at a time, in all transports”. Doing it that way also means we immediately run into difficulties if some transports’ structures make a design difficult. There may be other ways to track things or to structure the implementation. And maybe that whole discussion is premature, and we need a prototype PR, or a few, sort of like this, testing the full workflow to just run into things and see what are the things that will need to be done. |
The goal is to allow building the digest-choice-dependent machinery from the bottom up without committing to an API while we don't understand the problem space; for now, we don't expose any way for users to actually make a choice. This code is never called at this point, so this should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
for now, purely to let us internally work on implementing digest choice / conversions; there is no way for a caller to set it. Should not change behavior, the field has no users. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This is: - UNUSABLE (no external API) - INCOMPLETE (manifest digests, TryReusingBlob, ...) - BROKEN (various code assumes that digest returned from upload matches input) - INCONSISTENT (PutBlobOptions.Digests is only implemented in one transport, and silently ignored elsewhere) Signed-off-by: Miloslav Trmač <mitr@redhat.com>
8a75ccc to
f15717e
Compare
Add UpdateConfigDigest() method to the manifest interface to support updating config descriptor digests when forcing different digest algorithms. This foundational change allows callers to update config digests uniformly across different manifest formats (OCI, Docker schema2) while properly failing for unsupported formats (Docker schema1, which lacks a separate config descriptor). Implementations: - OCI and Docker schema2: Update config digest directly - Docker schema1: Return error (no separate config descriptor) This is needed to support forcing digest algorithms in image copy operations, where the config blob may be re-hashed with a different algorithm. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
Add public API to force a specific digest algorithm when copying images. This allows users to override the default sha256 digest algorithm. The API is marked as "Broken" (unstable) because: - Not all transports support it yet - Multi-arch images are not yet handled - Some edge cases still need work See containers#552 for status. Key behaviors: - Validates algorithm is available on the system - Prevents reconfiguration (fails if already set) - Respects existing digestOptions if already configured This is the user-facing entry point; actual implementation follows in subsequent commits. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
Implement the core logic to force digest algorithms when copying images: 1. Config digest handling: - Modify copyConfig() to return new digest when algorithm changes - Add updateManifestConfigDigest() helper to update config digest in manifest - Use manifest abstraction layer for uniform digest updates 2. Manifest digest selection: - Use digestOptions.Choose() to select manifest digest algorithm - Apply selected algorithm via manifest.DigestWithAlgorithm() 3. Digest mismatch handling: - Allow digest changes only when algorithms differ - Detect corruption when same algorithm produces different digest This builds on the manifest abstraction layer and API from previous commits to enable the actual digest forcing functionality. Note: Guards for unsupported scenarios (multi-arch, schema1, etc.) are added in the next commit to keep this focused on core logic. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
Add guards and limitations for cases where digest forcing is not yet supported or incompatible with certain features: 1. Multi-arch images (multiple.go): - Not yet implemented; needs per-instance config digest updates - Fail cleanly with clear error message - See containers#552 (comment) 2. Docker schema1 manifests (single.go): - Lack separate config descriptor, only support sha256 - Fail early when forcing non-sha256 algorithms 3. zstd:chunked compression (single.go): - Uses sha256 for internal TOC and chunk digests - Fall back to plain zstd when forcing non-sha256 - Fail if requireCompressionFormatMatch is set 4. PutBlobPartial optimization (single.go): - Incompatible with forcing different digest (needs full layer) - Disable partial pull when algorithms don't match 5. Blob reuse optimization (single.go): - Skip blob reuse when digest algorithms don't match - Future: could support cross-digest reuse via BlobInfoCache These guards ensure clean failures and fallbacks rather than silent corruption or confusing errors. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
f15717e to
eca922f
Compare
Depends on #512 . Only see HEAD commit here.