-
Notifications
You must be signed in to change notification settings - Fork 0
ci setup faq
Troubleshooting + edge cases for docs/ci-setup.md + .github/workflows/harness-ci.yml.
That's the designed behavior until TEST_OIDC_AWS_ROLE_ARN is set as a repo secret. The preflight job emits a ::warning:: reminder. Until the operator finishes the 7-step bring-up in docs/ci-setup.md, only rust-checks runs — and that's enough to catch most regressions (600+ tests).
AWS hasn't found the test broker's OIDC provider. Three checks:
- The OIDC provider ARN matches the broker's
BROKER_OIDC_ISSUERbyte-for-byte (including scheme and trailing slash). - The broker's
.well-known/openid-configurationis reachable from the public internet (curl from a random box, not just the runner). - The IAM trust policy on the test role lists the OIDC provider ARN under
Principal.Federated.
That's the test working — stage-3 step 5 / 8 / 9 are NEGATIVE tests that EXPECT AccessDenied. If they pass-as-success, the workflow exits 0. If they pass with AccessDenied, the harness script asserts that (the per-actor + per-data-class invariants from CLAUDE.md). A genuine failure is the script exiting non-zero, not the AWS API returning AccessDenied.
Per-run prefix isolation via CI_S3_PREFIX=ci/run-${GITHUB_RUN_ID} should prevent this. If you see it anyway:
- Confirm
CI_S3_PREFIXis being honored by every write site in the harness (currentlyharness/v2-stage3-demo.shhonors it; verify if you've added other harness steps). - Make sure
concurrency.cancel-in-progress: trueis set in the workflow (it is — but a previous-run-in-flight can briefly overlap).
Happens when the operator redeploys the test contracts (e.g. after a .sol source change) but forgets to update the TEST_*_HEIMA secrets. Symptoms: stage-1 step 8 (verify-contracts) fails with "no bytecode at $SCOPE_ADDR".
Fix: re-read addresses from scripts/operator-workstation.env post-redeploy, update the six TEST_*_HEIMA secrets via the GitHub UI. Use the GitHub CLI:
for addr in SCOPE_CONTRACT_ADDRESS_HEIMA SIDECAR_REGISTRY_ADDRESS_HEIMA K3_EPOCH_COUNTER_ADDRESS_HEIMA \
CREDENTIAL_AUDIT_ADDRESS_HEIMA P256_VERIFIER_ADDRESS_HEIMA K11_VERIFIER_ADDRESS_HEIMA; do
val=$(grep "^${addr}=" scripts/operator-workstation.env | cut -d= -f2)
gh secret set "TEST_${addr}" --body "$val"
doneCI doesn't redeploy on every run (it uses pinned addresses from secrets). The deployer wallet is only spent when the operator manually re-runs setup-heima.sh for the test instance. If it does run out:
# Check balance
cast balance "$(cast wallet address $(cat ~/.agentkeys/heima-deployer-test.key))" \
--rpc-url "$(agentkeys chain show heima | jq -r .rpc.http)"
# Top up from your personal wallet — small float (~1 HEI) is enoughworkflow_dispatch.inputs requires the workflow to be on the default branch (or your fork's default). If the workflow file landed on a feature branch, gh workflow run may fail. Either land it on main first, or push the feature branch and re-target:
gh workflow run harness-ci.yml --ref my-branch --field stage=3It already does — push + pull_request triggers are wired in on: at the top. The gate is TEST_OIDC_AWS_ROLE_ARN, not the trigger. Every PR's rust-checks job runs unconditionally; the harness-e2e job runs only if the secret is set.
GitHub doesn't pass secrets to fork PRs by default — that's a platform security feature. The harness-e2e job will preflight-skip on fork PRs even with the secret set. Reviewer needs to push the fork branch to the upstream repo or manually dispatch the workflow from the PR page.
Q. aws-actions/configure-aws-credentials succeeds but aws sts get-caller-identity says agentkeys-admin
You forgot to update the role ARN secret after rotating to OIDC. The default credential chain falls through to whatever AWS profile is on the runner image. Set TEST_OIDC_AWS_ROLE_ARN to the GitHub Actions OIDC role ARN (not the admin user ARN), and the OIDC web identity will assume the right role.
Per the existing @claude review workflow convention: broker integration tests mutate process-global $HOME + $AWS_* env, and the keyring tests serialize on a per-process accounts map. Concurrent threads see each other's mutations and flake. Single-threaded test execution is the conservative default; per-test isolation cleanup is a future improvement.
-
Swatinem/rust-cache@v2withshared-key: harness-ciis enabled — both jobs share a cache. -
concurrency.cancel-in-progress: truecancels stale runs on a re-push. - Foundry toolchain is the slowest install; pin to
version: stablefor cache hits. - The 60-minute timeout on
harness-e2eis generous; typical run is 20–30 min.
If runs still feel slow, profile with gh run view <run-id> --log-failed | head -50 to find the longest step.
Each harness script writes a temp dir under /tmp/agentkeys-*. The workflow uploads /tmp/agentkeys-ci-ephemeral-*/ as the ephemeral-stack-logs artifact on failure (for the harness-e2e job). Download via gh run download <run-id>.
- Operator runbook: docs/ci-setup.md
- Workflow file: .github/workflows/harness-ci.yml
- Cloud setup FAQ: cloud-setup-faq
- Heima setup FAQ: heima-setup-faq