Skip to content

Commit b4eaf36

Browse files
butler54claude
andcommitted
feat: add bare metal support for Intel TDX and AMD SEV-SNP
Adds a new `baremetal` clusterGroup with: - NFD-based auto-detection of TDX/SEV-SNP hardware - RuntimeClass for kata-tdx and kata-snp - MachineConfig for kernel params and vsock - Intel DCAP chart (PCCS + QGS) for TDX attestation - LVMS and HPP storage provider support - PCCS secrets generation in gen-secrets.sh - Bare metal documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent aa836f4 commit b4eaf36

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1475
-14
lines changed

README.md

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,23 @@
22

33
Validated pattern for deploying confidential containers on OpenShift using the [Validated Patterns](https://validatedpatterns.io/) framework.
44

5-
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure.
5+
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure and bare metal.
66

77
## Topologies
88

9-
The pattern provides two deployment topologies:
9+
The pattern provides three deployment topologies:
1010

11-
1. **Single cluster** (`simple` clusterGroup) — deploys all components (Trustee, Vault, ACM, sandboxed containers, workloads) in one cluster. This breaks the RACI separation expected in a remote attestation architecture but simplifies testing and demonstrations.
11+
1. **Single cluster** (`simple` clusterGroup) — deploys all components (Trustee, Vault, ACM, sandboxed containers, workloads) in one cluster on Azure. This breaks the RACI separation expected in a remote attestation architecture but simplifies testing and demonstrations.
1212

1313
2. **Multi-cluster** (`trusted-hub` + `spoke` clusterGroups) — separates the trusted zone from the untrusted workload zone:
1414
- **Hub** (`trusted-hub`): Runs Trustee (KBS + attestation service), HashiCorp Vault, ACM, and cert-manager. This cluster is the trust anchor.
1515
- **Spoke** (`spoke`): Runs the sandboxed containers operator and confidential workloads. The spoke is imported into ACM and managed from the hub.
1616

17+
3. **Bare metal** (`baremetal` clusterGroup) — deploys all components on bare metal hardware with Intel TDX or AMD SEV-SNP support. NFD (Node Feature Discovery) auto-detects the CPU architecture and configures the appropriate runtime. Supports SNO (Single Node OpenShift) and multi-node clusters.
18+
1719
The topology is controlled by the `main.clusterGroupName` field in `values-global.yaml`.
1820

19-
Currently supports Azure via peer-pods. Peer-pods provision confidential VMs (`Standard_DCas_v5` family) directly on the Azure hypervisor rather than nesting VMs inside worker nodes.
21+
Azure deployments use peer-pods, which provision confidential VMs (`Standard_DCas_v5` family) directly on the Azure hypervisor. Bare metal deployments use layered images and hardware TEE features directly.
2022

2123
## Current version (4.*)
2224

@@ -42,9 +44,18 @@ All previous versions used pre-GA (Technology Preview) releases of Trustee:
4244

4345
### Prerequisites
4446

47+
**Azure deployments:**
4548
- OpenShift 4.17+ cluster on Azure (self-managed via `openshift-install` or ARO)
4649
- Azure `Standard_DCas_v5` VM quota in your target region (these are confidential computing VMs and are not available in all regions). See the note below for more details.
4750
- Azure DNS hosting the cluster's DNS zone
51+
52+
**Bare metal deployments:**
53+
- OpenShift 4.17+ cluster on bare metal with Intel TDX or AMD SEV-SNP hardware
54+
- BIOS/firmware configured to enable TDX or SEV-SNP
55+
- Available block devices for LVMS storage (auto-discovered)
56+
- For Intel TDX: an Intel PCS API key from [api.portal.trustedservices.intel.com](https://api.portal.trustedservices.intel.com/)
57+
58+
**Common:**
4859
- Tools on your workstation: `podman`, `yq`, `jq`, `skopeo`
4960
- OpenShift pull secret saved at `~/pull-secret.json` (download from [console.redhat.com](https://console.redhat.com/openshift/downloads))
5061
- Fork the repository — ArgoCD reconciles cluster state against your fork, so changes must be pushed to your remote
@@ -53,20 +64,20 @@ All previous versions used pre-GA (Technology Preview) releases of Trustee:
5364

5465
These scripts generate the cryptographic material and attestation measurements needed by Trustee and the peer-pod VMs. Run them once before your first deployment.
5566

56-
1. `bash scripts/gen-secrets.sh` — generates KBS key pairs, attestation policy seeds, and copies `values-secret.yaml.template` to `~/values-secret-coco-pattern.yaml`
57-
2. `bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`)
58-
3. Review and customise `~/values-secret-coco-pattern.yaml` — this file is loaded into Vault and provides secrets to the pattern
67+
1. `bash scripts/gen-secrets.sh` — generates KBS key pairs, PCCS certificates/tokens (for bare metal), and copies `values-secret.yaml.template` to `~/values-secret-coco-pattern.yaml`
68+
2. `bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`). **Not required for bare metal deployments.**
69+
3. Review and customise `~/values-secret-coco-pattern.yaml` — this file is loaded into Vault and provides secrets to the pattern. For bare metal, uncomment the PCCS secrets section and provide your Intel PCS API key.
5970

6071
> **Note:** `gen-secrets.sh` will not overwrite existing secrets. Delete `~/.coco-pattern/` if you need to regenerate.
6172
62-
### Single cluster deployment
73+
### Single cluster deployment (Azure)
6374

6475
1. Set `main.clusterGroupName: simple` in `values-global.yaml`
6576
2. Ensure your Azure configuration is populated in `values-global.yaml` (see `global.azure.*` fields)
6677
3. `./pattern.sh make install`
6778
4. Wait for the cluster to reboot all nodes (the sandboxed containers operator triggers a MachineConfig update). Monitor progress in the ArgoCD UI.
6879

69-
### Multi-cluster deployment
80+
### Multi-cluster deployment (Azure)
7081

7182
1. Set `main.clusterGroupName: trusted-hub` in `values-global.yaml`
7283
2. Deploy the hub cluster: `./pattern.sh make install`
@@ -76,6 +87,24 @@ These scripts generate the cryptographic material and attestation measurements n
7687
(see [importing a cluster](https://validatedpatterns.io/learn/importing-a-cluster/))
7788
6. ACM will automatically deploy the `spoke` clusterGroup applications (sandboxed containers, workloads) to the imported cluster
7889

90+
### Bare metal deployment
91+
92+
1. Set `main.clusterGroupName: baremetal` in `values-global.yaml`
93+
2. Run `bash scripts/gen-secrets.sh` to generate KBS keys and PCCS secrets
94+
3. For Intel TDX: uncomment the PCCS secrets in `~/values-secret-coco-pattern.yaml` and provide your Intel PCS API key
95+
4. `./pattern.sh make install`
96+
5. Wait for the cluster to reboot nodes (MachineConfig updates for TDX kernel parameters and vsock)
97+
98+
The system auto-detects your hardware:
99+
- **NFD** discovers Intel TDX or AMD SEV-SNP capabilities and labels nodes
100+
- **LVMS** auto-discovers available block devices for storage
101+
- **RuntimeClass** `kata-cc` is created automatically pointing to the correct handler (`kata-tdx` or `kata-snp`)
102+
- Both `kata-tdx` and `kata-snp` RuntimeClasses are deployed; only the one matching your hardware has schedulable nodes
103+
- MachineConfigs are deployed for both `master` and `worker` roles (safe on SNO where only master exists)
104+
- PCCS and QGS services deploy unconditionally; DaemonSets only schedule on Intel nodes via NFD labels
105+
106+
Optional: pin PCCS to a specific node with `bash scripts/get-pccs-node.sh` and set `baremetal.pccs.nodeSelector` in the baremetal chart values.
107+
79108
## Sample applications
80109

81110
Two sample applications are deployed on the cluster running confidential workloads (the single cluster in `simple` mode, or the spoke in multi-cluster mode):

ansible/detect-runtime-class.yaml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
- name: Detect and configure runtime class
2+
hosts: localhost
3+
connection: local
4+
gather_facts: false
5+
tasks:
6+
- name: Check for Intel TDX nodes
7+
kubernetes.core.k8s_info:
8+
api_version: v1
9+
kind: Node
10+
label_selectors:
11+
- intel.feature.node.kubernetes.io/tdx=true
12+
register: tdx_nodes
13+
14+
- name: Check for AMD SEV-SNP nodes
15+
kubernetes.core.k8s_info:
16+
api_version: v1
17+
kind: Node
18+
label_selectors:
19+
- amd.feature.node.kubernetes.io/snp=true
20+
register: snp_nodes
21+
22+
- name: Set runtime handler for Intel TDX
23+
set_fact:
24+
kata_handler: "kata-tdx"
25+
kata_overhead:
26+
memory: "350Mi"
27+
cpu: "250m"
28+
tdx.intel.com/keys: "1"
29+
kata_node_selector:
30+
intel.feature.node.kubernetes.io/tdx: "true"
31+
when: tdx_nodes.resources | length > 0
32+
33+
- name: Set runtime handler for AMD SEV-SNP
34+
set_fact:
35+
kata_handler: "kata-snp"
36+
kata_overhead:
37+
memory: "350Mi"
38+
cpu: "250m"
39+
kata_node_selector:
40+
amd.feature.node.kubernetes.io/snp: "true"
41+
when: snp_nodes.resources | length > 0
42+
43+
- name: Create kata-cc RuntimeClass
44+
kubernetes.core.k8s:
45+
state: present
46+
definition:
47+
apiVersion: node.k8s.io/v1
48+
kind: RuntimeClass
49+
metadata:
50+
name: kata-cc
51+
handler: "{{ kata_handler }}"
52+
overhead:
53+
podFixed: "{{ kata_overhead }}"
54+
scheduling:
55+
nodeSelector: "{{ kata_node_selector }}"
56+
when: kata_handler is defined

charts/all/baremetal/Chart.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
apiVersion: v2
2+
description: Bare metal platform configuration (NFD rules, MachineConfigs, RuntimeClasses, Intel device plugin).
3+
keywords:
4+
- pattern
5+
- upstream
6+
- sandbox
7+
- baremetal
8+
name: baremetal
9+
version: 0.0.1
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[hypervisor.qemu]
2+
kernel_params="agent.aa_kbc_params=cc_kbc::http://kbs-trustee-operator-system.{{ .Values.global.hubClusterDomain }}"
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
apiVersion: nfd.openshift.io/v1alpha1
2+
kind: NodeFeatureRule
3+
metadata:
4+
name: consolidated-hardware-features
5+
namespace: openshift-nfd
6+
spec:
7+
rules:
8+
- name: "runtime.kata"
9+
labels:
10+
feature.node.kubernetes.io/runtime.kata: "true"
11+
matchAny:
12+
- matchFeatures:
13+
- feature: cpu.cpuid
14+
matchExpressions:
15+
SSE42: { op: Exists }
16+
VMX: { op: Exists }
17+
- feature: kernel.loadedmodule
18+
matchExpressions:
19+
kvm: { op: Exists }
20+
kvm_intel: { op: Exists }
21+
- matchFeatures:
22+
- feature: cpu.cpuid
23+
matchExpressions:
24+
SSE42: { op: Exists }
25+
SVM: { op: Exists }
26+
- feature: kernel.loadedmodule
27+
matchExpressions:
28+
kvm: { op: Exists }
29+
kvm_amd: { op: Exists }
30+
31+
- name: "amd.sev-snp"
32+
labels:
33+
amd.feature.node.kubernetes.io/snp: "true"
34+
extendedResources:
35+
sev-snp.amd.com/esids: "@cpu.security.sev.encrypted_state_ids"
36+
matchFeatures:
37+
- feature: cpu.cpuid
38+
matchExpressions:
39+
SVM: { op: Exists }
40+
- feature: cpu.security
41+
matchExpressions:
42+
sev.snp.enabled: { op: Exists }
43+
44+
- name: "intel.sgx"
45+
labels:
46+
intel.feature.node.kubernetes.io/sgx: "true"
47+
extendedResources:
48+
sgx.intel.com/epc: "@cpu.security.sgx.epc"
49+
matchFeatures:
50+
- feature: cpu.cpuid
51+
matchExpressions:
52+
SGX: { op: Exists }
53+
SGXLC: { op: Exists }
54+
- feature: cpu.security
55+
matchExpressions:
56+
sgx.enabled: { op: IsTrue }
57+
- feature: kernel.config
58+
matchExpressions:
59+
X86_SGX: { op: Exists }
60+
61+
- name: "intel.tdx"
62+
labels:
63+
intel.feature.node.kubernetes.io/tdx: "true"
64+
extendedResources:
65+
tdx.intel.com/keys: "@cpu.security.tdx.total_keys"
66+
matchFeatures:
67+
- feature: cpu.cpuid
68+
matchExpressions:
69+
VMX: { op: Exists }
70+
- feature: cpu.security
71+
matchExpressions:
72+
tdx.enabled: { op: Exists }
73+
74+
- name: "ibm.se.enabled"
75+
labels:
76+
ibm.feature.node.kubernetes.io/se: "true"
77+
matchFeatures:
78+
- feature: cpu.security
79+
matchExpressions:
80+
se.enabled: { op: IsTrue }
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{{- range list "master" "worker" }}
2+
---
3+
apiVersion: machineconfiguration.openshift.io/v1
4+
kind: MachineConfig
5+
metadata:
6+
labels:
7+
machineconfiguration.openshift.io/role: {{ . }}
8+
name: 96-kata-kernel-config-{{ . }}
9+
namespace: openshift-machine-config-operator
10+
spec:
11+
config:
12+
ignition:
13+
version: 3.2.0
14+
storage:
15+
files:
16+
- contents:
17+
source: 'data:text/plain;charset=utf-8;base64,{{ tpl ($.Files.Get "bm-kernel-params.yaml") $ | b64enc }}'
18+
mode: 420
19+
overwrite: true
20+
path: /etc/kata-containers/snp/config.d/96-kata-kernel-config
21+
{{- end }}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: nfd.openshift.io/v1
2+
kind: NodeFeatureDiscovery
3+
metadata:
4+
name: nfd-instance
5+
namespace: openshift-nfd
6+
spec:
7+
operand:
8+
image: registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.20
9+
imagePullPolicy: Always
10+
servicePort: 12000
11+
workerConfig:
12+
configData: |
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# apiVersion: node.k8s.io/v1
2+
# kind: RuntimeClass
3+
# metadata:
4+
# name: kata-snp
5+
# handler: kata-snp
6+
# overhead:
7+
# podFixed:
8+
# memory: "350Mi"
9+
# cpu: "250m"
10+
# scheduling:
11+
# nodeSelector:
12+
# amd.feature.node.kubernetes.io/snp: "true"
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# apiVersion: node.k8s.io/v1
2+
# kind: RuntimeClass
3+
# metadata:
4+
# name: kata-tdx
5+
# handler: kata-tdx
6+
# overhead:
7+
# podFixed:
8+
# memory: "350Mi"
9+
# cpu: "250m"
10+
# tdx.intel.com/keys: 1
11+
# scheduling:
12+
# nodeSelector:
13+
# intel.feature.node.kubernetes.io/tdx: "true"
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{{- range list "master" "worker" }}
2+
---
3+
apiVersion: machineconfiguration.openshift.io/v1
4+
kind: MachineConfig
5+
metadata:
6+
labels:
7+
machineconfiguration.openshift.io/role: {{ . }}
8+
name: 99-enable-coco-{{ . }}
9+
spec:
10+
kernelArguments:
11+
- nohibernate
12+
config:
13+
ignition:
14+
version: 3.2.0
15+
storage:
16+
files:
17+
- path: /etc/modules-load.d/vsock.conf
18+
mode: 0644
19+
contents:
20+
source: data:text/plain;charset=utf-8;base64,dnNvY2stbG9vcGJhY2sK
21+
{{- end }}

0 commit comments

Comments
 (0)