Skip to content

Remove images from TEC definition & run watchers independently of TECs#246

Open
Jakob-Naucke wants to merge 3 commits into
trusted-execution-clusters:mainfrom
Jakob-Naucke:decouple-tec-pcr-compute
Open

Remove images from TEC definition & run watchers independently of TECs#246
Jakob-Naucke wants to merge 3 commits into
trusted-execution-clusters:mainfrom
Jakob-Naucke:decouple-tec-pcr-compute

Conversation

@Jakob-Naucke
Copy link
Copy Markdown
Contributor

@Jakob-Naucke Jakob-Naucke commented Apr 23, 2026

Move trusteeImage etc. out of the TEC CRD. Treat them as
operator-wide, and always use RELATED_IMAGE_TRUSTEE etc. as reference
instead of as fallback.

Fixes: #215

Watch ApprovedImages and compute reference values independently of a
TEC existing. Adopt ApprovedImages from the TEC in order to include
them in uninstallation. Have PCR computation jobs owned by the
ApprovedImages. This avoids conflicting watchers on ApprovedImages
while using a simple flow.

Fixes: #216

Summary by Sourcery

Decouple reference value handling and image configuration from the TrustedExecutionCluster CRD, making image management operator-wide and enabling ApprovedImage and PCR computation controllers to run independently of cluster instances while preserving correct adoption and cleanup semantics.

New Features:

  • Support operator-wide management of ApprovedImages and PCR computation jobs independent of any specific TrustedExecutionCluster, including retroactive adoption when a cluster is created later.

Bug Fixes:

  • Ensure ApprovedImages and their PCR data are properly cleaned up and disallowed on TrustedExecutionCluster deletion, even when images predate the cluster.

Enhancements:

  • Simplify controller wiring by passing the Kubernetes client directly instead of a custom RV context and centralising TrustedExecutionCluster lookup utilities.
  • Make all operator-managed component images (trustee, compute-pcrs, registration-server, attestation-key-register) configured solely via RELATED_IMAGE_* environment variables rather than TEC spec fields.
  • Improve reference value ownership and lifecycle by having ApprovedImages own PCR computation jobs and by using finalizers to remove PCR entries from the shared ConfigMap.
  • Expand unit and integration test coverage around reference value flows, ApprovedImage adoption, and TrustedExecutionCluster discovery.
  • Refine test utilities with reusable error helpers and constants, and update CI and Makefile test commands to run both binaries and libraries.

CI:

  • Adjust Rust workflow test invocations to run both binary and library tests across toolchains.

Documentation:

  • Update design and usage documentation to describe the new ownership model and lifecycle for ApprovedImages and reference values, including new ownership flow diagrams.

Tests:

  • Add and adjust unit and integration tests for reference value computation, ApprovedImage adoption and cleanup, TrustedExecutionCluster lookup helpers, and error-path behaviours in controllers and services.

@Jakob-Naucke Jakob-Naucke requested a review from alicefr April 23, 2026 17:34
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jakob-Naucke

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jakob-Naucke Jakob-Naucke force-pushed the decouple-tec-pcr-compute branch 2 times, most recently from 70daeb6 to 051b57c Compare April 24, 2026 08:47
@Jakob-Naucke
Copy link
Copy Markdown
Contributor Author

Let's see if we can get past login without openshift/release#77892 's changes
/retest

@Jakob-Naucke
Copy link
Copy Markdown
Contributor Author

/retest

@Jakob-Naucke
Copy link
Copy Markdown
Contributor Author

Jakob-Naucke commented Apr 24, 2026

/retest
e: apparently not

@alicefr
Copy link
Copy Markdown
Contributor

alicefr commented May 11, 2026

@sourcery-ai review

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 11, 2026

Reviewer's Guide

Decouples image configuration and reference value watchers from the TrustedExecutionCluster CR so that operator-wide images are sourced from RELATED_IMAGE_* env vars and ApprovedImage resources (and their PCR computation Jobs) are reconciled and owned independently of TECs, with TECs only adopting ApprovedImages for uninstallation and lifecycle, plus adds helper APIs, refactors context passing to use bare Client, and broadens test coverage and CI to run lib tests.

Entity-relationship diagram for TrustedExecutionCluster, ApprovedImage, Jobs, and image-pcrs ConfigMap

erDiagram
    TrustedExecutionCluster {
        string name
        string uid
        string publicAttestationKeyRegisterAddr
        string publicTrusteeAddr
        int trusteeKbsPort
    }

    ApprovedImage {
        string name
        string uid
        string spec_image
        string status_conditions
        string ownerReferences
    }

    ComputePcrsJob {
        string name
        string labels_kind
        string ownerReferences
        string status_completionTime
    }

    ImagePcrsConfigMap {
        string name
        string data_image_pcrs
    }

    ReferenceValuesConfigMap {
        string name
        string data_reference_values
    }

    TrustedExecutionCluster ||--o{ ApprovedImage : adopts
    ApprovedImage ||--o{ ComputePcrsJob : owns

    ApprovedImage ||--|| ImagePcrsConfigMap : has_pcr_entry_in
    ImagePcrsConfigMap ||--|| ReferenceValuesConfigMap : feeds

    TrustedExecutionCluster ||--o{ ComputePcrsJob : indirectly_triggers

    OperatorPod {
        string env_RELATED_IMAGE_TRUSTEE
        string env_RELATED_IMAGE_COMPUTE_PCRS
        string env_RELATED_IMAGE_REGISTRATION_SERVER
        string env_RELATED_IMAGE_ATTESTATION_KEY_REGISTER
    }

    OperatorPod ||--o{ TrustedExecutionCluster : reconciles
    OperatorPod ||--o{ ApprovedImage : reconciles
    OperatorPod ||--o{ ComputePcrsJob : watches
    OperatorPod ||--|| ImagePcrsConfigMap : maintains
    OperatorPod ||--|| ReferenceValuesConfigMap : maintains
Loading

File-Level Changes

Change Details Files
Refactor reference value management to work without a TrustedExecutionCluster-specific context and to have Jobs owned by ApprovedImage resources.
  • Replace RvContextData usage with direct kube::Client passing in reference_values and trustee modules and simplify job/image controllers to carry only Client in reconciliation context.
  • Create PCR ConfigMap without setting a TEC ownerReference and adjust job_reconcile and launch_rv_job_controller to delete Jobs and refresh reference values using only the Client.
  • Change handle_new_image/compute_fresh_pcrs/disallow_image to operate on ApprovedImage objects and Client, resolving compute-pcrs image from RELATED_IMAGE_COMPUTE_PCRS with a COMPONENT_VERSION-based default.
  • Ensure PCR computation Jobs set ApprovedImage as ownerReference and add image_remove_reconcile to disallow images based on cluster presence and deletion state.
  • Add adoption helpers adopt_approved_image/adopt_approved_images so TECs adopt ApprovedImages as owners for uninstall, and update image_add_reconcile to use them and handle cluster lookup via get_opt_trusted_execution_cluster.
operator/src/reference_values.rs
operator/src/trustee.rs
operator/src/lib.rs
Make operator-wide images configured via RELATED_IMAGE_* env vars instead of TrustedExecutionClusterSpec fields and run reference-value controllers globally instead of per-TEC.
  • Remove image fields from TrustedExecutionClusterSpec and the get_image_or_env helper; adjust installer functions to read RELATED_IMAGE_TRUSTEE, RELATED_IMAGE_COMPUTE_PCRS, RELATED_IMAGE_REGISTRATION_SERVER, and RELATED_IMAGE_ATTESTATION_KEY_REGISTER directly with sensible defaults.
  • Start PCR ConfigMap creation and reference value image/job controllers once at operator startup in main, and simplify reconcile to no longer launch per-TEC watchers or maintain ClusterContext/UID.
  • Add adopt_approved_images call during reconcile after installing trustee/register-server/attestation-key-register so all existing ApprovedImages are owned by the TEC for lifecycle management.
  • Update trusted-cluster-gen to inject RELATED_IMAGE_* env vars into the operator Deployment instead of encoding image references in the TEC CRD spec.
operator/src/main.rs
api/v1alpha1/crds.go
api/trusted-cluster-gen.go
Introduce a reusable API to optionally fetch the single TrustedExecutionCluster and add tests around it.
  • Add get_opt_trusted_execution_cluster that returns Option and refactor get_trusted_execution_cluster to use it while preserving error semantics when none exist.
  • Update reference value image reconciliation to call get_opt_trusted_execution_cluster so ApprovedImages can be reconciled without requiring a TEC, and adjust logic when no TEC or multiple TECs exist.
  • Add unit tests for the new helper covering some/none/multiple/error cases and wire in extra dev-dependencies for lib tests.
lib/src/lib.rs
lib/Cargo.toml
operator/src/reference_values.rs
Strengthen and extend unit/integration tests and shared testing utilities to cover new flows and error handling.
  • Add tests for PCR ConfigMap creation without owners, PCR Job reconciliation with plain Client, compute_fresh_pcrs, image adoption/adopt_approved_images, and image_remove_reconcile.
  • Add new integration test test_approved_image_readoption to validate that ApprovedImages created before a TEC are re-adopted after TEC recreation and that reference values get regenerated.
  • Introduce TEST_UID constant, test_error_method macro, and make test_error public in test_utils to reduce duplication and standardize HTTP method-specific error tests; update existing tests in trustee, register_server, attestation_key_register, register-server binary, and reference_values modules to use it.
  • Adjust dummy_cluster and other helpers to reflect removal of image fields from TrustedExecutionClusterSpec and fix minor test issues (string formatting, error messages).
  • Update CI workflow and Makefile to run both bin and lib tests in all jobs.
operator/src/reference_values.rs
tests/trusted_execution_cluster.rs
test_utils/src/mock_client.rs
operator/src/trustee.rs
operator/src/attestation_key_register.rs
operator/src/register_server.rs
register-server/src/main.rs
.github/workflows/rust.yml
Makefile
test_utils/src/lib.rs
Update documentation to reflect new ApprovedImage/TEC ownership model and reference value flow.
  • Document that ApprovedImages and their PCR computation Jobs are independent of TECs but are adopted by TECs for uninstallation, and that finalizers remove PCR entries on image deletion.
  • Add an ownership flow diagram for images and clarify user docs about TEC adopting ApprovedImages to ensure clean uninstallation of images and reference values.
docs/design/reference-values.md
docs/usage/os-and-node-lifecycle.md

Assessment against linked issues

Issue Objective Addressed Explanation
#215 Remove image configuration fields from the TrustedExecutionCluster CRD/spec and stop using the CR to select component images.
#215 Configure operator-managed component images via operator inputs (e.g., environment variables) instead, and update code/tooling to use these inputs with appropriate defaults.
#216 Modify ownerReference/finalizer semantics so that creating an ApprovedImage CR no longer fails with HTTP 422 due to the blockOwnerDeletion + finalizer conflict (specifically, ensure that the generate_owner_reference-based ownerReferences used for ApprovedImage do not trigger the admission error). The issue identifies generate_owner_reference hardcoding block_owner_deletion: Some(true) as the root cause when adding an ownerReference from ApprovedImage to TrustedExecutionCluster while also using a finalizer on ApprovedImage. In the PR, generate_owner_reference’s implementation still sets block_owner_deletion: Some(true); only the return type is changed to a custom Result. ApprovedImages are still adopted by TrustedExecutionCluster using generate_owner_reference(cluster) in adopt_approved_image, so their ownerReferences continue to have blockOwnerDeletion=true in combination with finalizers. The PR changes the ownership flow and adds new controllers, but does not remove or alter blockOwnerDeletion on these ownerReferences, so the specific admission-control conflict described in the issue is not actually resolved.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In adopt_approved_image/adopt_approved_images, the merge patch sets metadata.ownerReferences to a single-element array, which will drop any existing owners; consider appending the TEC owner reference to the existing list instead of overwriting it.
  • The adoption condition in image_add_reconcile only calls adopt_approved_image when cluster_owns(&cluster) is true, meaning images that are not yet owned by the TEC will never be adopted; this condition likely needs to be inverted.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `adopt_approved_image`/`adopt_approved_images`, the merge patch sets `metadata.ownerReferences` to a single-element array, which will drop any existing owners; consider appending the TEC owner reference to the existing list instead of overwriting it.
- The adoption condition in `image_add_reconcile` only calls `adopt_approved_image` when `cluster_owns(&cluster)` is true, meaning images that are not yet owned by the TEC will never be adopted; this condition likely needs to be inverted.

## Individual Comments

### Comment 1
<location path="operator/src/reference_values.rs" line_range="279" />
<code_context>
+        let uid = cluster.metadata.uid.as_ref();
+        uid.and_then(uid_owns).unwrap_or(false)
+    };
     // Adopt the image by adding TEC as owner reference if not already owned
-    let tec_uid = &ctx.owner_reference.uid;
-    let already_owned = image
</code_context>
<issue_to_address>
**issue (bug_risk):** Adoption condition looks inverted: we call `adopt_approved_image` only when the cluster already owns the ApprovedImage.

`cluster_owns` returns `true` when the `TrustedExecutionCluster` UID is already in `owner_references`, but the code calls `adopt_approved_image` only in that case. This means we attempt adoption only when the TEC already owns the image, and skip it when ownership is missing, which inverts the intended behavior and diverges from the per-image reconcile logic. Flip the condition to `!cluster_owns(&cluster)` so adoption runs only when the cluster does not yet own the image.
</issue_to_address>

### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="483-484" />
<code_context>
 }
 }
+
+named_test! {
+async fn test_approved_image_readoption() -> anyhow::Result<()> {
+    let test_ctx = setup!().await?;
+    let client = test_ctx.client();
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen `test_approved_image_readoption` by asserting that the re-created ApprovedImage is actually adopted by the new TEC (ownerReference).

Since this test verifies behavior when an `ApprovedImage` outlives its `TrustedExecutionCluster`, it would be helpful to also assert that the re-created `ApprovedImage` has an `ownerReference` pointing to the new TEC (matching UID/name), not just that its reference values are recomputed. That would more directly validate that the `adopt_approved_image` / `adopt_approved_images` logic behaves correctly from an end-user perspective.

Suggested implementation:

```rust
        let namespace = test_ctx.namespace();

        let tec_api: Api<TrustedExecutionCluster> = Api::namespaced(client.clone(), namespace);
        let tec = tec_api.get(TEC_NAME).await?;

        let owner_reference = generate_owner_reference(&tec)?;

        // Verify that the re-created ApprovedImage is adopted by the new TEC via ownerReference.
        let approved_image_api: Api<ApprovedImage> = Api::namespaced(client.clone(), namespace);
        let approved_image = approved_image_api.get(APPROVED_IMAGE_NAME).await?;

        let approved_image_owner_refs = approved_image
            .metadata
            .owner_references
            .clone()
            .unwrap_or_default();

        assert!(
            approved_image_owner_refs
                .iter()
                .any(|or| or.uid == owner_reference.uid && or.name == owner_reference.name),
            "Re-created ApprovedImage should have an ownerReference pointing to the new TEC (name={}, uid={}), but ownerReferences were: {:?}",
            owner_reference.name,
            owner_reference.uid,
            approved_image_owner_refs,
        );

```

To make this compile and fit your existing codebase, you may need to:

1. Ensure `ApprovedImage` is imported at the top of `tests/trusted_execution_cluster.rs`, for example:
   - `use crate::crd::ApprovedImage;` or the appropriate module path you use for this CRD.
2. Ensure `APPROVED_IMAGE_NAME` is in scope in this test:
   - If you already have a constant for the ApprovedImage name used elsewhere in the test file, reuse it.
   - Otherwise define a constant alongside `TEC_NAME`, e.g.:
     `const APPROVED_IMAGE_NAME: &str = "some-approved-image-name";`
3. If `generate_owner_reference(&tec)?` returns a collection (e.g. `Vec<OwnerReference>`), adjust the assertion to iterate over that collection (or pick the one corresponding to the TEC) instead of using `owner_reference.uid` and `owner_reference.name` directly.
</issue_to_address>

### Comment 3
<location path="docs/design/reference-values.md" line_range="99" />
<code_context>
+
+## Ownership
+
+Unlike `reference-values`, `ApprovedImages` can live independent of a `TrustedExecutionCluster` object.
+They can be created without one existing, and reference values are written by jobs (that the `ApprovedImages` also own) to the `image-pcrs` ConfigMap, which is created by the operator and is also independent of `TrustedExecutionClusters`.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Consider changing "live independent of" to "live independently of" for grammatical correctness.

Using "can live independently of a `TrustedExecutionCluster` object" would be more natural and grammatically standard.

```suggestion
Unlike `reference-values`, `ApprovedImages` can live independently of a `TrustedExecutionCluster` object.
```
</issue_to_address>

### Comment 4
<location path="docs/design/reference-values.md" line_range="103" />
<code_context>
+They can be created without one existing, and reference values are written by jobs (that the `ApprovedImages` also own) to the `image-pcrs` ConfigMap, which is created by the operator and is also independent of `TrustedExecutionClusters`.
+
+However, the `ApprovedImages` are adopted by the `TrustedExecutionCluster` object, both when created with a `TrustedExecutionCluster` existing and retroactively when created before `TrustedExecutionCluster` creation.
+This ensures that removal of a `TrustedExecutionCluster` acts as complete uninstallation.
+Finalizers on the `ApprovedImages` ensure the PCR values are removed back out of `image-pcrs` again.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Minor grammar improvement: add "a" before "complete uninstallation".

Consider changing this to “acts as a complete uninstallation” for smoother grammar.

```suggestion
This ensures that removal of a `TrustedExecutionCluster` acts as a complete uninstallation.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread operator/src/reference_values.rs
Comment thread tests/trusted_execution_cluster.rs
Comment thread docs/design/reference-values.md Outdated
Comment thread docs/design/reference-values.md Outdated
Move trusteeImage etc. out of the TEC CRD. Treat them as
operator-wide, and always use RELATED_IMAGE_TRUSTEE etc. as reference
instead of as fallback.

Fixes: trusted-execution-clusters#215

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Copy link
Copy Markdown
Contributor Author

@Jakob-Naucke Jakob-Naucke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue identifies generate_owner_reference hardcoding block_owner_deletion: Some(true) as the root cause when adding an ownerReference from ApprovedImage to TrustedExecutionCluster while also using a finalizer on ApprovedImage. In the PR, generate_owner_reference’s implementation still sets block_owner_deletion: Some(true); only the return type is changed to a custom Result. ApprovedImages are still adopted by TrustedExecutionCluster using generate_owner_reference(cluster) in adopt_approved_image, so their ownerReferences continue to have blockOwnerDeletion=true in combination with finalizers. The PR changes the ownership flow and adds new controllers, but does not remove or alter blockOwnerDeletion on these ownerReferences, so the specific admission-control conflict described in the issue is not actually resolved.

That was not the root cause, conflicting watchers were

Comment thread tests/trusted_execution_cluster.rs
@Jakob-Naucke Jakob-Naucke force-pushed the decouple-tec-pcr-compute branch from 145e363 to cdd29a1 Compare May 13, 2026 10:44
Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Watch ApprovedImages and compute reference values independently of a
TEC existing. Adopt ApprovedImages from the TEC in order to include
them in uninstallation. Have PCR computation jobs owned by the
ApprovedImages. This avoids conflicting watchers on ApprovedImages
while using a simple flow.

Test that adoption of an ApprovedImage works also when created 5
seconds before and after the TEC.

Fixes: trusted-execution-clusters#216

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
@Jakob-Naucke Jakob-Naucke force-pushed the decouple-tec-pcr-compute branch from cdd29a1 to 80851fa Compare May 13, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants