Skip to content

Commit cb00ff0

Browse files
Ambient Code Botclaude
andcommitted
feat(ci): ephemeral PR test instances on MPP dev cluster
- Build workflow pushes all 7 component images tagged pr-<N>-amd64 to quay on every PR - provision.sh creates/destroys TenantNamespace CRs on ambient-code--config; capacity-gated at 5 concurrent instances - install.sh deploys production manifests with PR image tags using ArgoCD SA token; handles MPP restricted environment constraints (Route labels, PVC annotations, ClusterRoleBinding subject patching) - pr-e2e-openshift.yml workflow: provision → install → e2e → teardown on build completion - pr-namespace-cleanup.yml: safety-net teardown on PR close - Skills: ambient-pr-test (full PR test workflow) and ambient (install on any OpenShift namespace) - Validates required secrets before install; documents MPP resource inventory and constraints - Route admission webhook fix: add paas.redhat.com/appcode label via kustomize filter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent de50276 commit cb00ff0

11 files changed

Lines changed: 1787 additions & 328 deletions

File tree

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
name: ambient-pr-test
3+
description: >-
4+
End-to-end workflow for testing a pull request against the MPP dev cluster.
5+
Builds and pushes images, provisions an ephemeral TenantNamespace, deploys
6+
Ambient, runs E2E tests, and tears down. Invoke with a PR URL.
7+
---
8+
9+
# Ambient PR Test Skill
10+
11+
You are an expert in running ephemeral PR validation environments on the Ambient Code MPP dev cluster. This skill orchestrates the full lifecycle: build → namespace provisioning → Ambient deployment → E2E test → teardown.
12+
13+
**Invoke this skill with a PR URL:**
14+
```
15+
with .claude/skills/ambient-pr-test https://github.com/ambient-code/platform/pull/1005
16+
```
17+
18+
> **Spec:** `components/pr-test/README.md` — TenantNamespace CR schema, naming rules, capacity parameters, RBAC, image tagging convention, provisioner contracts.
19+
> **Deployment detail:** `.claude/skills/ambient/SKILL.md` — how to install Ambient into a namespace.
20+
21+
Scripts in `components/pr-test/` implement all steps below. Prefer them over inline commands.
22+
23+
---
24+
25+
## Cluster Context
26+
27+
- **Cluster:** `dev-spoke-aws-us-east-1`
28+
- **Config namespace:** `ambient-code--config`
29+
- **Namespace pattern:** `ambient-code--<instance-id>`
30+
- **Instance ID pattern:** `pr-<PR_NUMBER>`
31+
- **Image tag pattern:** `quay.io/ambient_code/vteam_*:pr-<PR_NUMBER>-amd64`
32+
33+
For naming rules and slug budget, see `components/pr-test/README.md` § Instance Naming Convention.
34+
35+
### Permissions
36+
37+
User tokens (`oc whoami -t`) do **not** have cluster-admin. `install.sh` uses the `tenantaccess-argocd-account-token` from `ambient-code--config` (the ArgoCD SA token) for the kustomize apply — it has cluster-admin and can create ClusterRoleBindings, PVCs, and all namespace-scoped resources.
38+
39+
- `oc get crd` at cluster scope → Forbidden for user token (expected) — `install.sh` probes via `oc get agenticsessions -n $NAMESPACE` instead
40+
- CRDs and ClusterRoles must already exist — applied once by cluster-admin
41+
- ClusterRoleBindings are patched by the filter script to point subjects at the PR namespace
42+
43+
### Namespace Type
44+
45+
PR test namespaces must be provisioned as `type: runtime` (not `build`). MPP `build` namespaces cannot create Routes — the route admission webhook panics on all Route creates in `build` namespaces.
46+
47+
---
48+
49+
## Full Workflow
50+
51+
```
52+
0. Build and push images: bash components/pr-test/build.sh <pr-url>
53+
1. Derive instance-id from PR number + branch name
54+
2. Provision namespace: bash components/pr-test/provision.sh create <instance-id>
55+
3. Deploy Ambient: bash components/pr-test/install.sh <namespace> <image-tag>
56+
4. Run E2E tests
57+
5. Teardown: bash components/pr-test/provision.sh destroy <instance-id>
58+
```
59+
60+
---
61+
62+
## Step 0: Build and Push Images
63+
64+
```bash
65+
bash components/pr-test/build.sh https://github.com/ambient-code/platform/pull/1005
66+
```
67+
68+
Builds all 7 component images from the current checkout and pushes them to quay with the `pr-N-amd64` tag. Optional env vars:
69+
70+
| Variable | Default | Purpose |
71+
|----------|---------|---------|
72+
| `REGISTRY` | `quay.io/ambient_code` | Registry prefix |
73+
| `PLATFORM` | `linux/amd64` | Build platform |
74+
| `CONTAINER_ENGINE` | `docker` | `docker` or `podman` |
75+
76+
Skip this step if CI already pushed images (e.g. the PR's `Build and Push Component Docker Images` workflow completed successfully).
77+
78+
---
79+
80+
## Step 1: Derive Instance ID
81+
82+
```bash
83+
PR_URL="https://github.com/ambient-code/platform/pull/1005"
84+
PR_NUMBER=$(echo "$PR_URL" | grep -oE '[0-9]+$')
85+
86+
INSTANCE_ID="pr-${PR_NUMBER}"
87+
NAMESPACE="ambient-code--${INSTANCE_ID}"
88+
IMAGE_TAG="pr-${PR_NUMBER}-amd64"
89+
```
90+
91+
---
92+
93+
## Step 2: Provision Namespace
94+
95+
```bash
96+
bash components/pr-test/provision.sh create "$INSTANCE_ID"
97+
```
98+
99+
This applies the `TenantNamespace` CR to `ambient-code--config` and waits for the namespace to become Active (~10s). For the CR schema and capacity rules, see `components/pr-test/README.md` §§ TenantNamespace CR, Capacity Management.
100+
101+
---
102+
103+
## Step 3: Deploy Ambient
104+
105+
```bash
106+
bash components/pr-test/install.sh "$NAMESPACE" "$IMAGE_TAG"
107+
```
108+
109+
This copies secrets from `ambient-code--runtime-int`, deploys the production overlay with PR image tags, patches operator and agent-registry ConfigMaps, and waits for all rollouts. See `.claude/skills/ambient/SKILL.md` for detail on each step.
110+
111+
---
112+
113+
## Step 4: Run E2E Tests
114+
115+
```bash
116+
FRONTEND_URL="https://$(oc get route frontend-route -n $NAMESPACE -o jsonpath='{.spec.host}')"
117+
118+
cd e2e
119+
CYPRESS_BASE_URL="$FRONTEND_URL" \
120+
CYPRESS_ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
121+
npx cypress run --browser chrome
122+
```
123+
124+
---
125+
126+
## Step 5: Teardown
127+
128+
Always run teardown, even on failure.
129+
130+
```bash
131+
bash components/pr-test/provision.sh destroy "$INSTANCE_ID"
132+
```
133+
134+
Deletes the `TenantNamespace` CR and waits for the namespace to be gone. The tenant operator handles namespace deletion via finalizers — do not `oc delete namespace` directly.
135+
136+
---
137+
138+
## GitHub Actions Integration
139+
140+
The workflow `.github/workflows/pr-e2e-openshift.yml` automates steps 1–5 (build is handled by `components-build-deploy.yml`):
141+
142+
```
143+
PR push
144+
→ components-build-deploy.yml builds + pushes all images :pr-N-amd64
145+
→ pr-e2e-openshift.yml triggers on workflow_run completion
146+
job: provision → provision.sh create
147+
job: install → install.sh
148+
job: e2e → cypress
149+
job: teardown → always: provision.sh destroy
150+
151+
PR closed
152+
→ pr-namespace-cleanup.yml → provision.sh destroy (safety net)
153+
```
154+
155+
Required secrets:
156+
- `TEST_OPENSHIFT_SERVER` — API URL of dev-spoke-aws-us-east-1
157+
- `TEST_OPENSHIFT_TOKEN` — ServiceAccount token with tenant-admin on `ambient-code--config`
158+
- `ANTHROPIC_API_KEY` — for runner pods in test instances
159+
160+
---
161+
162+
## Listing Active Instances
163+
164+
```bash
165+
oc get tenantnamespace -n ambient-code--config \
166+
-l ambient-code/instance-type=s0x \
167+
-o custom-columns='NAME:.metadata.name,AGE:.metadata.creationTimestamp'
168+
```
169+
170+
---
171+
172+
## Troubleshooting
173+
174+
### Kustomize "no such file or directory" for `../../base`
175+
The production overlay uses relative paths (`../../base`). Copying only the overlay directory into a tmpdir breaks these references. `install.sh` copies the entire `components/manifests/` tree into the tmpdir and runs kustomize from `overlays/production/` within it.
176+
177+
### CRD apply fails with Forbidden
178+
This is expected when running as a user token (not cluster-admin). `install.sh` probes CRD presence via `oc get agenticsessions -n $NAMESPACE`. If that returns an error (not "No resources found"), CRDs are missing — ask a cluster-admin to apply them once.
179+
180+
### Route admission webhook — shard label
181+
Routes require `paas.redhat.com/appcode: AMBC-001` label (injected by filter). Do **not** add `shard: internal` — that requires a host on the internal domain. Without a shard label OpenShift auto-assigns a host on the external domain. The previous nil-pointer panic in the route admission webhook was a cluster-side bug, now fixed.
182+
183+
### ClusterRoleBindings — using ArgoCD SA token
184+
User tokens cannot create ClusterRoleBindings. `install.sh` fetches the `tenantaccess-argocd-account-token` secret from `ambient-code--config` and uses it for the full kustomize apply. This token has cluster-admin level access and can create ClusterRoleBindings. The Python filter script patches ClusterRoleBinding subjects from `ambient-code` to the PR namespace before applying.
185+
186+
### Build fails
187+
Check that `docker` (or `podman`) is logged in to `quay.io/ambient_code` before running `build.sh`. Use `docker login quay.io` or set `CONTAINER_ENGINE=podman`.
188+
189+
### Images not found in quay
190+
Either `build.sh` was not run, or the CI build workflow failed. Check Actions → `Build and Push Component Docker Images` for the PR.
191+
192+
### TenantNamespace not becoming Active
193+
```bash
194+
oc describe tenantnamespace $INSTANCE_ID -n ambient-code--config
195+
oc get events -n ambient-code--config --sort-by='.lastTimestamp' | tail -20
196+
```
197+
198+
### Namespace exists but pods won't schedule
199+
```bash
200+
oc get nodes
201+
oc describe namespace $NAMESPACE
202+
oc get resourcequota -n $NAMESPACE
203+
```
204+
205+
MPP enforces resource quotas on `build` type namespaces.
206+
207+
### JWT errors in ambient-api-server
208+
The production overlay configures JWT against Red Hat SSO. For ephemeral test instances, disable JWT validation:
209+
```bash
210+
oc set env deployment/ambient-api-server -n $NAMESPACE ENABLE_JWT=false
211+
oc rollout restart deployment/ambient-api-server -n $NAMESPACE
212+
```

0 commit comments

Comments
 (0)