|
| 1 | +--- |
| 2 | +name: ambient-pr-test |
| 3 | +description: >- |
| 4 | + End-to-end workflow for testing a pull request against the MPP dev cluster. |
| 5 | + Builds and pushes images, provisions an ephemeral TenantNamespace, deploys |
| 6 | + Ambient, runs E2E tests, and tears down. Invoke with a PR URL. |
| 7 | +--- |
| 8 | + |
| 9 | +# Ambient PR Test Skill |
| 10 | + |
| 11 | +You are an expert in running ephemeral PR validation environments on the Ambient Code MPP dev cluster. This skill orchestrates the full lifecycle: build → namespace provisioning → Ambient deployment → E2E test → teardown. |
| 12 | + |
| 13 | +**Invoke this skill with a PR URL:** |
| 14 | +``` |
| 15 | +with .claude/skills/ambient-pr-test https://github.com/ambient-code/platform/pull/1005 |
| 16 | +``` |
| 17 | + |
| 18 | +> **Spec:** `components/pr-test/README.md` — TenantNamespace CR schema, naming rules, capacity parameters, RBAC, image tagging convention, provisioner contracts. |
| 19 | +> **Deployment detail:** `.claude/skills/ambient/SKILL.md` — how to install Ambient into a namespace. |
| 20 | +
|
| 21 | +Scripts in `components/pr-test/` implement all steps below. Prefer them over inline commands. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Cluster Context |
| 26 | + |
| 27 | +- **Cluster:** `dev-spoke-aws-us-east-1` |
| 28 | +- **Config namespace:** `ambient-code--config` |
| 29 | +- **Namespace pattern:** `ambient-code--<instance-id>` |
| 30 | +- **Instance ID pattern:** `pr-<PR_NUMBER>` |
| 31 | +- **Image tag pattern:** `quay.io/ambient_code/vteam_*:pr-<PR_NUMBER>-amd64` |
| 32 | + |
| 33 | +For naming rules and slug budget, see `components/pr-test/README.md` § Instance Naming Convention. |
| 34 | + |
| 35 | +### Permissions |
| 36 | + |
| 37 | +User tokens (`oc whoami -t`) do **not** have cluster-admin. `install.sh` uses the `tenantaccess-argocd-account-token` from `ambient-code--config` (the ArgoCD SA token) for the kustomize apply — it has cluster-admin and can create ClusterRoleBindings, PVCs, and all namespace-scoped resources. |
| 38 | + |
| 39 | +- `oc get crd` at cluster scope → Forbidden for user token (expected) — `install.sh` probes via `oc get agenticsessions -n $NAMESPACE` instead |
| 40 | +- CRDs and ClusterRoles must already exist — applied once by cluster-admin |
| 41 | +- ClusterRoleBindings are patched by the filter script to point subjects at the PR namespace |
| 42 | + |
| 43 | +### Namespace Type |
| 44 | + |
| 45 | +PR test namespaces must be provisioned as `type: runtime` (not `build`). MPP `build` namespaces cannot create Routes — the route admission webhook panics on all Route creates in `build` namespaces. |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## Full Workflow |
| 50 | + |
| 51 | +``` |
| 52 | +0. Build and push images: bash components/pr-test/build.sh <pr-url> |
| 53 | +1. Derive instance-id from PR number + branch name |
| 54 | +2. Provision namespace: bash components/pr-test/provision.sh create <instance-id> |
| 55 | +3. Deploy Ambient: bash components/pr-test/install.sh <namespace> <image-tag> |
| 56 | +4. Run E2E tests |
| 57 | +5. Teardown: bash components/pr-test/provision.sh destroy <instance-id> |
| 58 | +``` |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## Step 0: Build and Push Images |
| 63 | + |
| 64 | +```bash |
| 65 | +bash components/pr-test/build.sh https://github.com/ambient-code/platform/pull/1005 |
| 66 | +``` |
| 67 | + |
| 68 | +Builds all 7 component images from the current checkout and pushes them to quay with the `pr-N-amd64` tag. Optional env vars: |
| 69 | + |
| 70 | +| Variable | Default | Purpose | |
| 71 | +|----------|---------|---------| |
| 72 | +| `REGISTRY` | `quay.io/ambient_code` | Registry prefix | |
| 73 | +| `PLATFORM` | `linux/amd64` | Build platform | |
| 74 | +| `CONTAINER_ENGINE` | `docker` | `docker` or `podman` | |
| 75 | + |
| 76 | +Skip this step if CI already pushed images (e.g. the PR's `Build and Push Component Docker Images` workflow completed successfully). |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## Step 1: Derive Instance ID |
| 81 | + |
| 82 | +```bash |
| 83 | +PR_URL="https://github.com/ambient-code/platform/pull/1005" |
| 84 | +PR_NUMBER=$(echo "$PR_URL" | grep -oE '[0-9]+$') |
| 85 | + |
| 86 | +INSTANCE_ID="pr-${PR_NUMBER}" |
| 87 | +NAMESPACE="ambient-code--${INSTANCE_ID}" |
| 88 | +IMAGE_TAG="pr-${PR_NUMBER}-amd64" |
| 89 | +``` |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +## Step 2: Provision Namespace |
| 94 | + |
| 95 | +```bash |
| 96 | +bash components/pr-test/provision.sh create "$INSTANCE_ID" |
| 97 | +``` |
| 98 | + |
| 99 | +This applies the `TenantNamespace` CR to `ambient-code--config` and waits for the namespace to become Active (~10s). For the CR schema and capacity rules, see `components/pr-test/README.md` §§ TenantNamespace CR, Capacity Management. |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## Step 3: Deploy Ambient |
| 104 | + |
| 105 | +```bash |
| 106 | +bash components/pr-test/install.sh "$NAMESPACE" "$IMAGE_TAG" |
| 107 | +``` |
| 108 | + |
| 109 | +This copies secrets from `ambient-code--runtime-int`, deploys the production overlay with PR image tags, patches operator and agent-registry ConfigMaps, and waits for all rollouts. See `.claude/skills/ambient/SKILL.md` for detail on each step. |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## Step 4: Run E2E Tests |
| 114 | + |
| 115 | +```bash |
| 116 | +FRONTEND_URL="https://$(oc get route frontend-route -n $NAMESPACE -o jsonpath='{.spec.host}')" |
| 117 | + |
| 118 | +cd e2e |
| 119 | +CYPRESS_BASE_URL="$FRONTEND_URL" \ |
| 120 | +CYPRESS_ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \ |
| 121 | + npx cypress run --browser chrome |
| 122 | +``` |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## Step 5: Teardown |
| 127 | + |
| 128 | +Always run teardown, even on failure. |
| 129 | + |
| 130 | +```bash |
| 131 | +bash components/pr-test/provision.sh destroy "$INSTANCE_ID" |
| 132 | +``` |
| 133 | + |
| 134 | +Deletes the `TenantNamespace` CR and waits for the namespace to be gone. The tenant operator handles namespace deletion via finalizers — do not `oc delete namespace` directly. |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +## GitHub Actions Integration |
| 139 | + |
| 140 | +The workflow `.github/workflows/pr-e2e-openshift.yml` automates steps 1–5 (build is handled by `components-build-deploy.yml`): |
| 141 | + |
| 142 | +``` |
| 143 | +PR push |
| 144 | + → components-build-deploy.yml builds + pushes all images :pr-N-amd64 |
| 145 | + → pr-e2e-openshift.yml triggers on workflow_run completion |
| 146 | + job: provision → provision.sh create |
| 147 | + job: install → install.sh |
| 148 | + job: e2e → cypress |
| 149 | + job: teardown → always: provision.sh destroy |
| 150 | +
|
| 151 | +PR closed |
| 152 | + → pr-namespace-cleanup.yml → provision.sh destroy (safety net) |
| 153 | +``` |
| 154 | + |
| 155 | +Required secrets: |
| 156 | +- `TEST_OPENSHIFT_SERVER` — API URL of dev-spoke-aws-us-east-1 |
| 157 | +- `TEST_OPENSHIFT_TOKEN` — ServiceAccount token with tenant-admin on `ambient-code--config` |
| 158 | +- `ANTHROPIC_API_KEY` — for runner pods in test instances |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | +## Listing Active Instances |
| 163 | + |
| 164 | +```bash |
| 165 | +oc get tenantnamespace -n ambient-code--config \ |
| 166 | + -l ambient-code/instance-type=s0x \ |
| 167 | + -o custom-columns='NAME:.metadata.name,AGE:.metadata.creationTimestamp' |
| 168 | +``` |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## Troubleshooting |
| 173 | + |
| 174 | +### Kustomize "no such file or directory" for `../../base` |
| 175 | +The production overlay uses relative paths (`../../base`). Copying only the overlay directory into a tmpdir breaks these references. `install.sh` copies the entire `components/manifests/` tree into the tmpdir and runs kustomize from `overlays/production/` within it. |
| 176 | + |
| 177 | +### CRD apply fails with Forbidden |
| 178 | +This is expected when running as a user token (not cluster-admin). `install.sh` probes CRD presence via `oc get agenticsessions -n $NAMESPACE`. If that returns an error (not "No resources found"), CRDs are missing — ask a cluster-admin to apply them once. |
| 179 | + |
| 180 | +### Route admission webhook — shard label |
| 181 | +Routes require `paas.redhat.com/appcode: AMBC-001` label (injected by filter). Do **not** add `shard: internal` — that requires a host on the internal domain. Without a shard label OpenShift auto-assigns a host on the external domain. The previous nil-pointer panic in the route admission webhook was a cluster-side bug, now fixed. |
| 182 | + |
| 183 | +### ClusterRoleBindings — using ArgoCD SA token |
| 184 | +User tokens cannot create ClusterRoleBindings. `install.sh` fetches the `tenantaccess-argocd-account-token` secret from `ambient-code--config` and uses it for the full kustomize apply. This token has cluster-admin level access and can create ClusterRoleBindings. The Python filter script patches ClusterRoleBinding subjects from `ambient-code` to the PR namespace before applying. |
| 185 | + |
| 186 | +### Build fails |
| 187 | +Check that `docker` (or `podman`) is logged in to `quay.io/ambient_code` before running `build.sh`. Use `docker login quay.io` or set `CONTAINER_ENGINE=podman`. |
| 188 | + |
| 189 | +### Images not found in quay |
| 190 | +Either `build.sh` was not run, or the CI build workflow failed. Check Actions → `Build and Push Component Docker Images` for the PR. |
| 191 | + |
| 192 | +### TenantNamespace not becoming Active |
| 193 | +```bash |
| 194 | +oc describe tenantnamespace $INSTANCE_ID -n ambient-code--config |
| 195 | +oc get events -n ambient-code--config --sort-by='.lastTimestamp' | tail -20 |
| 196 | +``` |
| 197 | + |
| 198 | +### Namespace exists but pods won't schedule |
| 199 | +```bash |
| 200 | +oc get nodes |
| 201 | +oc describe namespace $NAMESPACE |
| 202 | +oc get resourcequota -n $NAMESPACE |
| 203 | +``` |
| 204 | + |
| 205 | +MPP enforces resource quotas on `build` type namespaces. |
| 206 | + |
| 207 | +### JWT errors in ambient-api-server |
| 208 | +The production overlay configures JWT against Red Hat SSO. For ephemeral test instances, disable JWT validation: |
| 209 | +```bash |
| 210 | +oc set env deployment/ambient-api-server -n $NAMESPACE ENABLE_JWT=false |
| 211 | +oc rollout restart deployment/ambient-api-server -n $NAMESPACE |
| 212 | +``` |
0 commit comments