Skip to content

cloud setup faq

github-actions[bot] edited this page May 23, 2026 · 1 revision

Cloud setup — FAQ

Troubleshooting + edge cases for the two cloud-side operator docs:

Use ⌘F to find your error.

Q. setup-broker-host.sh says "BROKER_OIDC_ISSUER mismatch" on re-run

The script auto-detects an existing systemd unit and reads Environment= lines to decide bootstrap-vs-upgrade. If you ran with a different --issuer-url previously and the AWS OIDC provider was already registered for the old URL, the new run refuses.

Fix: decide which URL is canonical. AWS validates the OIDC issuer URL byte-for-byte against the JWT iss claim, so the issuer URL is effectively immutable once the IAM trust policy is built. Either:

  • Re-run with the OLD --issuer-url (the trust policy already matches).
  • Or delete the OIDC provider, redo §4 from cloud-bootstrap.md, and re-run with the NEW URL.

Q. nginx 502 after a fresh setup-broker-host.sh run

systemd may have started the broker before nginx finished its first systemctl reload. Two-step fix:

sudo systemctl status agentkeys-broker          # → active (running)
sudo systemctl restart nginx                    # picks up the new vhost
curl -sf https://${BROKER_HOST}/healthz         # → 200

If the broker itself is failing to boot, journalctl -u agentkeys-broker -n 50 is authoritative.

Q. verify_sender_ready precheck fails at broker boot

The broker calls SES GetEmailIdentity on BROKER_EMAIL_FROM_ADDRESS at startup. If the SES domain identity isn't verified yet, boot refuses. Run scripts/ses-verify-sender.sh and wait for the DKIM tokens to propagate (5–30 min typical), then restart the broker.

Q. aws iam create-open-id-connect-provider returns EntityAlreadyExistsException

The OIDC provider already exists. Verify with:

aws iam list-open-id-connect-providers \
  | jq -r '.OpenIDConnectProviderList[].Arn' \
  | grep "${BROKER_HOST}"

If the ARN is correct, you're done — the trust policy and bucket policy from §4.3/§4.4 are the only steps that remain.

Q. AccessDenied from S3 even though the role + bucket policy look right

Three things almost always:

  1. The role's inline policy still has the broad-bucket grant from §3.5 — strip it via §4.4.1.
  2. The bucket policy's s3:prefix condition needs the ${aws:PrincipalTag/agentkeys_actor_omni} interpolation to be lowercased — addresses are case-sensitive in policy string comparisons.
  3. s3:ListBucket needs the s3:prefix=bots/${PrincipalTag}/<class>/* condition in a separate statement (the v3 split-statement bucket policy from codex P2). Listing the bucket root without that condition always returns AccessDenied.

CloudTrail's Decision field tells you which statement evaluated.

Q. Per-profile default region trap (real 2026-05-12 incident)

agentkeys-admin defaults to us-west-2; agentkeys-broker / agentkeys-daemon default to us-east-1. Every regional CLI call must pass --region "$REGION" explicitly. The CLAUDE.md "Per-profile default region is NOT uniform" section covers this in detail.

Q. Cert renewal failed silently — workflow turned red overnight

certbot renewals run on a 90-day cadence. If they fail (often: rate limit, DNS-01 hiccup, port 80 firewall block), AWS stops trusting the OIDC issuer (TLS chain breaks). Symptoms:

  • harness-e2e CI job fails on the first curl https://${BROKER_HOST} with a TLS error.
  • journalctl -u certbot-renew shows the failure reason.

Recovery: rerun sudo certbot renew --force-renewal (works for transient rate-limit issues), or fix the DNS / firewall and re-run. The broker doesn't need to restart — nginx reloads automatically.

Q. Switching AWS accounts for the test instance

Same-account is fine — isolation comes from the -test suffix, not from the AWS account boundary. If you want hard account isolation, every reference to ${ACCOUNT_ID} in cloud-bootstrap.md becomes ${TEST_ACCOUNT_ID}, including the role ARN that the broker assumes via OIDC. The setup-broker-host.sh script accepts --account-id to point at a different account.

Q. Tencent Cloud port?

§2.2 of cloud-bootstrap.md sketches SimpleDM + COS as the swap-in at the §3+ boundary. The boundary is real — DNS + inbound mail are the only AWS-specific layers; everything from agentkeys-data-role onward is provider-agnostic in shape, with COS providing S3-compatible PutObject/GetObject and Tencent's IAM providing OIDC federation. Real port work is tracked separately.

Q. Can I run the broker without nginx?

Yes — setup-broker-host.sh --without-nginx --without-certbot skips both. You're then responsible for TLS termination upstream (CloudFront, ALB, custom reverse proxy). AWS still needs to fetch the OIDC discovery + JWKS over public TLS, so whatever fronts the broker must serve https://${BROKER_HOST}/.well-known/* with a valid leaf cert.

Q. The systemd unit was hand-edited and now setup-broker-host.sh refuses

Per CLAUDE.md "Remote broker host (single entry point)" — don't hand-edit. To recover:

sudo systemctl stop agentkeys-broker
sudo rm /etc/systemd/system/agentkeys-broker.service
sudo systemctl daemon-reload
sudo bash scripts/setup-broker-host.sh --yes

The script rewrites the unit clean. If you had a legitimately custom field, add a --*-host or --cred-mode flag to the script and re-run — that's how all per-host overrides ship.

Related

Clone this wiki locally