Remove images from TEC definition & run watchers independently of TECs#246
Remove images from TEC definition & run watchers independently of TECs#246Jakob-Naucke wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Jakob-Naucke The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
70daeb6 to
051b57c
Compare
|
Let's see if we can get past login without openshift/release#77892 's changes |
|
/retest |
|
/retest |
051b57c to
145e363
Compare
|
@sourcery-ai review |
Reviewer's GuideDecouples image configuration and reference value watchers from the TrustedExecutionCluster CR so that operator-wide images are sourced from RELATED_IMAGE_* env vars and ApprovedImage resources (and their PCR computation Jobs) are reconciled and owned independently of TECs, with TECs only adopting ApprovedImages for uninstallation and lifecycle, plus adds helper APIs, refactors context passing to use bare Client, and broadens test coverage and CI to run lib tests. Entity-relationship diagram for TrustedExecutionCluster, ApprovedImage, Jobs, and image-pcrs ConfigMaperDiagram
TrustedExecutionCluster {
string name
string uid
string publicAttestationKeyRegisterAddr
string publicTrusteeAddr
int trusteeKbsPort
}
ApprovedImage {
string name
string uid
string spec_image
string status_conditions
string ownerReferences
}
ComputePcrsJob {
string name
string labels_kind
string ownerReferences
string status_completionTime
}
ImagePcrsConfigMap {
string name
string data_image_pcrs
}
ReferenceValuesConfigMap {
string name
string data_reference_values
}
TrustedExecutionCluster ||--o{ ApprovedImage : adopts
ApprovedImage ||--o{ ComputePcrsJob : owns
ApprovedImage ||--|| ImagePcrsConfigMap : has_pcr_entry_in
ImagePcrsConfigMap ||--|| ReferenceValuesConfigMap : feeds
TrustedExecutionCluster ||--o{ ComputePcrsJob : indirectly_triggers
OperatorPod {
string env_RELATED_IMAGE_TRUSTEE
string env_RELATED_IMAGE_COMPUTE_PCRS
string env_RELATED_IMAGE_REGISTRATION_SERVER
string env_RELATED_IMAGE_ATTESTATION_KEY_REGISTER
}
OperatorPod ||--o{ TrustedExecutionCluster : reconciles
OperatorPod ||--o{ ApprovedImage : reconciles
OperatorPod ||--o{ ComputePcrsJob : watches
OperatorPod ||--|| ImagePcrsConfigMap : maintains
OperatorPod ||--|| ReferenceValuesConfigMap : maintains
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
adopt_approved_image/adopt_approved_images, the merge patch setsmetadata.ownerReferencesto a single-element array, which will drop any existing owners; consider appending the TEC owner reference to the existing list instead of overwriting it. - The adoption condition in
image_add_reconcileonly callsadopt_approved_imagewhencluster_owns(&cluster)is true, meaning images that are not yet owned by the TEC will never be adopted; this condition likely needs to be inverted.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `adopt_approved_image`/`adopt_approved_images`, the merge patch sets `metadata.ownerReferences` to a single-element array, which will drop any existing owners; consider appending the TEC owner reference to the existing list instead of overwriting it.
- The adoption condition in `image_add_reconcile` only calls `adopt_approved_image` when `cluster_owns(&cluster)` is true, meaning images that are not yet owned by the TEC will never be adopted; this condition likely needs to be inverted.
## Individual Comments
### Comment 1
<location path="operator/src/reference_values.rs" line_range="279" />
<code_context>
+ let uid = cluster.metadata.uid.as_ref();
+ uid.and_then(uid_owns).unwrap_or(false)
+ };
// Adopt the image by adding TEC as owner reference if not already owned
- let tec_uid = &ctx.owner_reference.uid;
- let already_owned = image
</code_context>
<issue_to_address>
**issue (bug_risk):** Adoption condition looks inverted: we call `adopt_approved_image` only when the cluster already owns the ApprovedImage.
`cluster_owns` returns `true` when the `TrustedExecutionCluster` UID is already in `owner_references`, but the code calls `adopt_approved_image` only in that case. This means we attempt adoption only when the TEC already owns the image, and skip it when ownership is missing, which inverts the intended behavior and diverges from the per-image reconcile logic. Flip the condition to `!cluster_owns(&cluster)` so adoption runs only when the cluster does not yet own the image.
</issue_to_address>
### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="483-484" />
<code_context>
}
}
+
+named_test! {
+async fn test_approved_image_readoption() -> anyhow::Result<()> {
+ let test_ctx = setup!().await?;
+ let client = test_ctx.client();
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen `test_approved_image_readoption` by asserting that the re-created ApprovedImage is actually adopted by the new TEC (ownerReference).
Since this test verifies behavior when an `ApprovedImage` outlives its `TrustedExecutionCluster`, it would be helpful to also assert that the re-created `ApprovedImage` has an `ownerReference` pointing to the new TEC (matching UID/name), not just that its reference values are recomputed. That would more directly validate that the `adopt_approved_image` / `adopt_approved_images` logic behaves correctly from an end-user perspective.
Suggested implementation:
```rust
let namespace = test_ctx.namespace();
let tec_api: Api<TrustedExecutionCluster> = Api::namespaced(client.clone(), namespace);
let tec = tec_api.get(TEC_NAME).await?;
let owner_reference = generate_owner_reference(&tec)?;
// Verify that the re-created ApprovedImage is adopted by the new TEC via ownerReference.
let approved_image_api: Api<ApprovedImage> = Api::namespaced(client.clone(), namespace);
let approved_image = approved_image_api.get(APPROVED_IMAGE_NAME).await?;
let approved_image_owner_refs = approved_image
.metadata
.owner_references
.clone()
.unwrap_or_default();
assert!(
approved_image_owner_refs
.iter()
.any(|or| or.uid == owner_reference.uid && or.name == owner_reference.name),
"Re-created ApprovedImage should have an ownerReference pointing to the new TEC (name={}, uid={}), but ownerReferences were: {:?}",
owner_reference.name,
owner_reference.uid,
approved_image_owner_refs,
);
```
To make this compile and fit your existing codebase, you may need to:
1. Ensure `ApprovedImage` is imported at the top of `tests/trusted_execution_cluster.rs`, for example:
- `use crate::crd::ApprovedImage;` or the appropriate module path you use for this CRD.
2. Ensure `APPROVED_IMAGE_NAME` is in scope in this test:
- If you already have a constant for the ApprovedImage name used elsewhere in the test file, reuse it.
- Otherwise define a constant alongside `TEC_NAME`, e.g.:
`const APPROVED_IMAGE_NAME: &str = "some-approved-image-name";`
3. If `generate_owner_reference(&tec)?` returns a collection (e.g. `Vec<OwnerReference>`), adjust the assertion to iterate over that collection (or pick the one corresponding to the TEC) instead of using `owner_reference.uid` and `owner_reference.name` directly.
</issue_to_address>
### Comment 3
<location path="docs/design/reference-values.md" line_range="99" />
<code_context>
+
+## Ownership
+
+Unlike `reference-values`, `ApprovedImages` can live independent of a `TrustedExecutionCluster` object.
+They can be created without one existing, and reference values are written by jobs (that the `ApprovedImages` also own) to the `image-pcrs` ConfigMap, which is created by the operator and is also independent of `TrustedExecutionClusters`.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Consider changing "live independent of" to "live independently of" for grammatical correctness.
Using "can live independently of a `TrustedExecutionCluster` object" would be more natural and grammatically standard.
```suggestion
Unlike `reference-values`, `ApprovedImages` can live independently of a `TrustedExecutionCluster` object.
```
</issue_to_address>
### Comment 4
<location path="docs/design/reference-values.md" line_range="103" />
<code_context>
+They can be created without one existing, and reference values are written by jobs (that the `ApprovedImages` also own) to the `image-pcrs` ConfigMap, which is created by the operator and is also independent of `TrustedExecutionClusters`.
+
+However, the `ApprovedImages` are adopted by the `TrustedExecutionCluster` object, both when created with a `TrustedExecutionCluster` existing and retroactively when created before `TrustedExecutionCluster` creation.
+This ensures that removal of a `TrustedExecutionCluster` acts as complete uninstallation.
+Finalizers on the `ApprovedImages` ensure the PCR values are removed back out of `image-pcrs` again.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Minor grammar improvement: add "a" before "complete uninstallation".
Consider changing this to “acts as a complete uninstallation” for smoother grammar.
```suggestion
This ensures that removal of a `TrustedExecutionCluster` acts as a complete uninstallation.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Move trusteeImage etc. out of the TEC CRD. Treat them as operator-wide, and always use RELATED_IMAGE_TRUSTEE etc. as reference instead of as fallback. Fixes: trusted-execution-clusters#215 Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Jakob-Naucke
left a comment
There was a problem hiding this comment.
The issue identifies generate_owner_reference hardcoding block_owner_deletion: Some(true) as the root cause when adding an ownerReference from ApprovedImage to TrustedExecutionCluster while also using a finalizer on ApprovedImage. In the PR, generate_owner_reference’s implementation still sets block_owner_deletion: Some(true); only the return type is changed to a custom Result. ApprovedImages are still adopted by TrustedExecutionCluster using generate_owner_reference(cluster) in adopt_approved_image, so their ownerReferences continue to have blockOwnerDeletion=true in combination with finalizers. The PR changes the ownership flow and adds new controllers, but does not remove or alter blockOwnerDeletion on these ownerReferences, so the specific admission-control conflict described in the issue is not actually resolved.
That was not the root cause, conflicting watchers were
145e363 to
cdd29a1
Compare
Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Watch ApprovedImages and compute reference values independently of a TEC existing. Adopt ApprovedImages from the TEC in order to include them in uninstallation. Have PCR computation jobs owned by the ApprovedImages. This avoids conflicting watchers on ApprovedImages while using a simple flow. Test that adoption of an ApprovedImage works also when created 5 seconds before and after the TEC. Fixes: trusted-execution-clusters#216 Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
cdd29a1 to
80851fa
Compare
Move trusteeImage etc. out of the TEC CRD. Treat them as
operator-wide, and always use RELATED_IMAGE_TRUSTEE etc. as reference
instead of as fallback.
Fixes: #215
Watch ApprovedImages and compute reference values independently of a
TEC existing. Adopt ApprovedImages from the TEC in order to include
them in uninstallation. Have PCR computation jobs owned by the
ApprovedImages. This avoids conflicting watchers on ApprovedImages
while using a simple flow.
Fixes: #216
Summary by Sourcery
Decouple reference value handling and image configuration from the TrustedExecutionCluster CRD, making image management operator-wide and enabling ApprovedImage and PCR computation controllers to run independently of cluster instances while preserving correct adoption and cleanup semantics.
New Features:
Bug Fixes:
Enhancements:
CI:
Documentation:
Tests: