-
Notifications
You must be signed in to change notification settings - Fork 0
cloud setup faq
Troubleshooting + edge cases for the two cloud-side operator docs:
-
docs/cloud-bootstrap.md— first-time provisioning (per account or per cloud provider). -
docs/cloud-bootstrap.md— ongoing OIDC federation + broker-host re-deploys.
Use ⌘F to find your error.
The script auto-detects an existing systemd unit and reads Environment= lines to decide bootstrap-vs-upgrade. If you ran with a different --issuer-url previously and the AWS OIDC provider was already registered for the old URL, the new run refuses.
Fix: decide which URL is canonical. AWS validates the OIDC issuer URL byte-for-byte against the JWT iss claim, so the issuer URL is effectively immutable once the IAM trust policy is built. Either:
- Re-run with the OLD
--issuer-url(the trust policy already matches). - Or delete the OIDC provider, redo §4 from cloud-bootstrap.md, and re-run with the NEW URL.
systemd may have started the broker before nginx finished its first systemctl reload. Two-step fix:
sudo systemctl status agentkeys-broker # → active (running)
sudo systemctl restart nginx # picks up the new vhost
curl -sf https://${BROKER_HOST}/healthz # → 200If the broker itself is failing to boot, journalctl -u agentkeys-broker -n 50 is authoritative.
The broker calls SES GetEmailIdentity on BROKER_EMAIL_FROM_ADDRESS at startup. If the SES domain identity isn't verified yet, boot refuses. Run scripts/ses-verify-sender.sh and wait for the DKIM tokens to propagate (5–30 min typical), then restart the broker.
The OIDC provider already exists. Verify with:
aws iam list-open-id-connect-providers \
| jq -r '.OpenIDConnectProviderList[].Arn' \
| grep "${BROKER_HOST}"If the ARN is correct, you're done — the trust policy and bucket policy from §4.3/§4.4 are the only steps that remain.
Three things almost always:
- The role's inline policy still has the broad-bucket grant from §3.5 — strip it via §4.4.1.
- The bucket policy's
s3:prefixcondition needs the${aws:PrincipalTag/agentkeys_actor_omni}interpolation to be lowercased — addresses are case-sensitive in policy string comparisons. -
s3:ListBucketneeds thes3:prefix=bots/${PrincipalTag}/<class>/*condition in a separate statement (the v3 split-statement bucket policy from codex P2). Listing the bucket root without that condition always returns AccessDenied.
CloudTrail's Decision field tells you which statement evaluated.
agentkeys-admin defaults to us-west-2; agentkeys-broker / agentkeys-daemon default to us-east-1. Every regional CLI call must pass --region "$REGION" explicitly. The CLAUDE.md "Per-profile default region is NOT uniform" section covers this in detail.
certbot renewals run on a 90-day cadence. If they fail (often: rate limit, DNS-01 hiccup, port 80 firewall block), AWS stops trusting the OIDC issuer (TLS chain breaks). Symptoms:
-
harness-e2eCI job fails on the firstcurl https://${BROKER_HOST}with a TLS error. -
journalctl -u certbot-renewshows the failure reason.
Recovery: rerun sudo certbot renew --force-renewal (works for transient rate-limit issues), or fix the DNS / firewall and re-run. The broker doesn't need to restart — nginx reloads automatically.
Same-account is fine — isolation comes from the -test suffix, not from the AWS account boundary. If you want hard account isolation, every reference to ${ACCOUNT_ID} in cloud-bootstrap.md becomes ${TEST_ACCOUNT_ID}, including the role ARN that the broker assumes via OIDC. The setup-broker-host.sh script accepts --account-id to point at a different account.
§2.2 of cloud-bootstrap.md sketches SimpleDM + COS as the swap-in at the §3+ boundary. The boundary is real — DNS + inbound mail are the only AWS-specific layers; everything from agentkeys-data-role onward is provider-agnostic in shape, with COS providing S3-compatible PutObject/GetObject and Tencent's IAM providing OIDC federation. Real port work is tracked separately.
Yes — setup-broker-host.sh --without-nginx --without-certbot skips both. You're then responsible for TLS termination upstream (CloudFront, ALB, custom reverse proxy). AWS still needs to fetch the OIDC discovery + JWKS over public TLS, so whatever fronts the broker must serve https://${BROKER_HOST}/.well-known/* with a valid leaf cert.
Per CLAUDE.md "Remote broker host (single entry point)" — don't hand-edit. To recover:
sudo systemctl stop agentkeys-broker
sudo rm /etc/systemd/system/agentkeys-broker.service
sudo systemctl daemon-reload
sudo bash scripts/setup-broker-host.sh --yesThe script rewrites the unit clean. If you had a legitimately custom field, add a --*-host or --cred-mode flag to the script and re-run — that's how all per-host overrides ship.
- Operator runbook: docs/cloud-bootstrap.md
- Single entry point: scripts/setup-broker-host.sh
- Heima chain FAQ: heima-setup-faq
- CI FAQ: ci-setup-faq