Audience: the operator standing up a brand-new cloud account to host AgentKeys for the first time, or porting the deployment to a new cloud provider (AliCloud, GCP, Tencent Cloud).
Scope: the per-account, run-once provisioning that has to happen before the broker host can come up (§§3–8 of this doc), followed by the per-broker OIDC federation activation (§9), broker host bring-up (§10), and tear-down (§11). Identifiers (DNS names, IAM principals, mail backend, object store, initial bucket policy) + runtime activation in one place.
FAQ + troubleshooting: wiki/cloud-setup-faq.md.
After this doc is run, the operator returns here ONLY when:
- Switching cloud providers (e.g. AWS → AliCloud)
- Adding a second AWS account (test instance, regional shard)
- Re-bootstrapping after a teardown
- Auditing the identity surface (the security-audit checklist in §7)
The day-to-day broker re-deploys live in §10 below (setup-broker-host.sh); they re-run that section without touching §§1–9.
Tight five-step flow. Explanation + per-step reasoning are in §1–§11 below; the same flow works for prod (no --test) or test (--test swaps in -test identifiers everywhere). The orchestrator scripts/setup-cloud.sh is idempotent — re-running is safe.
For each stack (prod and test) you stand up SEPARATELY:
- Launch an EC2 — t3.small minimum (Ubuntu 22.04 LTS recommended).
t3.microruns the OS but its 1 GB RAM gets OOM-killed compilingaws-sdk-s3duringsetup-broker-host.sh. If you already have a t3.micro you can resize:aws ec2 stop-instances→modify-instance-attribute --instance-type t3.small→start-instances(EIP stays attached, INSTANCE_ID unchanged). - Allocate an EIP (or reuse one) and attach it to the EC2.
- Open SG ports 22 (SSH), 80 (certbot HTTP-01 challenge), 443 (TLS) to
0.0.0.0/0. All three are required — port 80 is needed for Let's Encrypt to validate domain ownership during cert issuance (step 5b), even though steady-state traffic only flows over 443. Verify withaws ec2 describe-security-groups --group-ids <sg-id> --query 'SecurityGroups[].IpPermissions[].[FromPort,IpRanges[].CidrIp]'— you should see all three ports. - Generate or import an SSH key pair (the
.pemyou'll keep as the fallback when EC2 Instance Connect is down). Confirm SSH works:ssh -i your.pem ubuntu@<EIP>. - The default
ubuntuuser is enough for now — theagentkeySSH login user (used by EC2 Instance Connect later) is created automatically bysetup-broker-host.shin step 5, along with theec2-instance-connectpackage. - Note INSTANCE_ID + EIP — both go into the env files in step 2.
The 2×2 matrix: {operator-workstation, broker} × {prod, test} = 4 files. The two operator-workstation files carry account-wide identifiers; the two broker files carry per-machine identifiers (INSTANCE_ID + EIP).
Both operator-workstation files are pre-populated with litentry.org / account 429071895007 defaults, and every derived value uses bash ${VAR} substitution off of ACCOUNT_ID / BROKER_HOST / ZONE. The script writes 2 values back automatically — operator never hand-edits them:
EIP=…persisted to broker env file by step 4 (after allocate-or-adopt)DATA_ROLE_ARN=…persisted to operator env file by step 11 (after data role create)
| File | Operator edits | What to set |
|---|---|---|
scripts/operator-workstation.env |
None if your account is litentry.org / 429071895007. 5 keys if you're forking: ACCOUNT_ID, BROKER_HOST, ZONE, PARENT_ZONE_ID, MAIL_DOMAIN (the other ~20 keys all derive). |
account-wide identifiers |
scripts/operator-workstation.test.env |
None in the same case. Same 5 keys (or just ZONE + PARENT_ZONE_ID) for a fork. |
-test variants pre-derived |
scripts/broker.env |
INSTANCE_ID=i-… |
EIP is written by the script |
scripts/broker.test.env |
INSTANCE_ID=i-… |
EIP is written by the script |
In practice: paste INSTANCE_ID into the two broker env files. Done.
awsp agentkeys-admin
# Prod stack:
bash scripts/setup-cloud.sh --yes
# Test stack — --test auto-selects scripts/operator-workstation.test.env
# + scripts/broker.test.env and suffixes IAM identifiers with -test:
bash scripts/setup-cloud.sh --test --yesThe orchestrator walks 15 idempotent steps (cloud-side AWS resources + IAM users + per-data-class roles + bucket policies + DNS UPSERTs). Steps 10 (agentkeys-daemon[-test]) and 12 (agentkeys-broker[-test]) print access keys to copy off — they're shown ONCE.
Append the two access-key blocks from step 3 to ~/.aws/credentials:
[agentkeys-daemon-test]
aws_access_key_id = AKIA…
aws_secret_access_key = …
region = us-east-1
[agentkeys-broker-test]
aws_access_key_id = AKIA…
aws_secret_access_key = …
region = us-east-1(Drop the -test suffix for the prod variants. Account-owner agentkeys-admin is shared — no -test variant.)
Add to ~/.zshenv (works in zsh + bash):
export AGENTKEYS_REPO="$HOME/Projects/agentKeys"
alias ssh-agentkeys='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh prod'
alias ssh-agentkeys-test='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test'
alias ssh-agentkeys-fallback='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh prod --fallback'
alias ssh-agentkeys-test-fallback='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test --fallback'source ~/.zshenv. The fallback aliases use the .pem key + ubuntu user; the non-fallback ones use EC2 Instance Connect + the agentkey user (which comes online in step 5).
First-time SSH: use the fallback path (the agentkey user doesn't exist yet — setup-broker-host.sh creates it):
ssh-agentkeys-test-fallback # ssh -i ~/.ssh/your.pem ubuntu@<test EIP>
# On the EC2 (~10-15 min on t3.small):
git clone https://github.com/litentry/agentKeys.git
cd agentKeys
sudo bash scripts/setup-broker-host.sh --test --yesTwo flags. --test triggers the -test suffix on every derived hostname / bucket / email; --issuer-url + --account-id auto-derive from ZONE + ACCOUNT_ID in scripts/operator-workstation.env (which the repo clone ships with). Override any flag explicitly if you need a non-conventional name. For prod, drop --test:
sudo bash scripts/setup-broker-host.sh --yesWhat --test derives automatically:
signer-test.${ZONE},audit-test.${ZONE},email-test.${ZONE},cred-test.${ZONE},memory-test.${ZONE}agentkeys-vault-test-${ACCOUNT_ID},agentkeys-memory-test-${ACCOUNT_ID}noreply-test@bots-test.${ZONE}https://test-broker.${ZONE}for the OIDC issuer URL
When the script finishes (~10-15 min on t3.small cold; ~30-60s on re-runs), it does three things at the end so steady-state operator work is one keystroke from your laptop:
- Creates the
agentkeySSH login user (separate from theagentkeysdaemon system user). - Installs
ec2-instance-connect+ writes the sshdAuthorizedKeysCommandconfig so EC2 Instance Connect can push ephemeral keys toagentkey. - Relocates the repo
/home/ubuntu/agentKeys→/home/agentkey/agentKeys(chowned toagentkey) so re-runs + ongoing edits happen as the steady-state user.
Then exit the ubuntu session and reconnect as agentkey for everything from here on:
exit # leave the ubuntu fallback session
ssh-agentkeys-test # Instance Connect, no .pem needed
cd ~/agentKeys # → /home/agentkey/agentKeys, files visibleSubsequent re-runs (git pull + sudo bash scripts/setup-broker-host.sh --test --yes) happen from /home/agentkey/agentKeys — step 10's relocation is idempotent (existence check skips when already in place). The cargo build cache survives the move (it's inside target/). The Rust toolchain itself is deleted from /root/ at the end of the first run to save ~1.5 GB — future re-runs reinstall it as part of the toolchain step automatically. This keeps the box clean and ensures only one canonical Rust install on disk at a time.
For prod, the same flow applies — drop --test everywhere and the relocation moves the repo from whichever home dir you bootstrapped in to /home/agentkey/.
Optional: install rustup for the agentkey user (dev-loop cargo). If you want to run cargo clippy / cargo test interactively as agentkey (e.g., to mirror the CI Linux env locally and catch cfg(target_os = "linux") clippy lints that don't fire on macOS), install rustup under your own $HOME once after reconnecting as agentkey:
ssh-agentkeys-test
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \
| sh -s -- -y --default-toolchain stable --profile minimal
source "$HOME/.cargo/env"
echo 'source "$HOME/.cargo/env"' >> ~/.bashrc # persist for future sessions
cargo --version # matches CI's stable channel
cd ~/agentKeys
cargo clippy --workspace --all-targets -- -D warnings # same lint set as CIThis is optional; the broker itself runs from compiled binaries, not from a live toolchain. Operators who only manage the deployed broker (no compile-in-place dev work) can skip this.
setup-broker-host.sh installs certbot but does NOT issue Let's Encrypt certs itself — issuance is DNS-dependent (the broker hostname must already resolve to this EIP on the public internet before Let's Encrypt's HTTP-01 challenger can validate it). Until you run the issuance below, nginx serves HTTP-only on :80 with a 503 "TLS cert not yet issued" placeholder on every non-ACME path — and the OIDC federation step in docs/ci-setup.md §1 can't succeed because there's no cert to extract a thumbprint from.
# Still on the broker host (as agentkey or ubuntu — both have sudo):
for h in ${BROKER_HOST} ${SIGNER_HOST} ${AUDIT_HOST} ${EMAIL_HOST} ${CRED_HOST} ${MEMORY_HOST}; do
sudo certbot certonly --webroot -w /var/www/certbot -d "$h" \
--agree-tos -m <your-ops-email> --non-interactive
done
# Flip nginx from Phase A (HTTP-only) → Phase B (HTTPS) — the renderer in
# setup-broker-host.sh picks Phase B automatically when /etc/letsencrypt/live/<host>/
# exists. Re-running the script is the trigger:
cd ~/agentKeys
sudo bash scripts/setup-broker-host.sh --test --yes # or drop --test for prodThe hostname env vars come from /etc/agentkeys/broker.env (which setup-broker-host.sh wrote at step 5). For test: BROKER_HOST=test-broker.${ZONE}, SIGNER_HOST=signer-test.${ZONE}, etc. For prod: drop the -test suffix.
Verify the cert is live (bypass laptop DNS, which may be rewritten by WARP / Zscaler / Tailscale to 198.18.x.y for ${ZONE}):
# DoH lookup — proves Route 53 has the right EIP, not your laptop's local resolver
curl -sS "https://dns.google/resolve?name=${BROKER_HOST}&type=A" | jq -r '.Answer[].data'
# → should be your EIP, not 198.18.x.y
# TLS handshake against the real EIP:
echo | openssl s_client -servername "${BROKER_HOST}" -connect "$(curl -sS "https://dns.google/resolve?name=${BROKER_HOST}&type=A" | jq -r '.Answer[0].data'):443" 2>&1 \
| grep -E "subject="
# → subject=/CN=<your-BROKER_HOST>If openssl s_client returns no peer certificate available, certbot didn't finish or nginx isn't on Phase B yet. Check:
sudo ls /etc/letsencrypt/live/— should list all 6 hostnames as subdirs.sudo ss -tlnp | grep ':443'— nginx should be on0.0.0.0:443.sudo tail /var/log/letsencrypt/letsencrypt.logfor the actual certbot failure.
Common failures + fixes:
Connection timeout to … port 80— the SG is missing port 80 ingress. Re-check step 1's SG requirements (you need 22, 80, and 443).DNS problem: NXDOMAIN— Route 53 doesn't have the A record yet, or DNS hasn't propagated. Wait 1-2 min, then retry. Quick check:curl -sS "https://dns.google/resolve?name=<host>&type=A"(do NOT rely ondig— local resolver may be lying).No such file or directory: /var/www/certbot— Phase A nginx render didn't complete; re-runsudo bash scripts/setup-broker-host.sh --test --yesfirst.
The rest of this doc explains why each step exists and how to recover from failures. Operators following the quick start above can skip to docs/chain-setup.md once step 5b completes.
§1 Identities — four IAM principals; concept first, then provider commands
§2 Domain + DNS — subdomain ownership; parent-zone confirmation
§3 Email backend — SES domain identity + receipt rule + S3 inbound bucket
§4 IAM users + roles — agentkeys-{admin,broker,daemon} + agentkeys-data-role
§5 Bucket policy — static-IAM variant (pre-OIDC; replaced in §9 below)
§6 Instance profile — agentkeys-broker-host (optional, EC2-only)
§7 Security audit — strip legacy over-broad attached policies
§8 Cloud portability — AWS → AliCloud / GCP / Tencent Cloud mapping
§9 OIDC federation — per-broker security upgrade after broker is reachable
§10 Broker host — what setup-broker-host.sh does
§11 Cleanup — full account teardown
Surgical re-run of any single step: bash scripts/setup-cloud.sh --only-step N (with --test for test).
Four env files cover the 2×2 matrix of {operator, broker} × {prod, test}. The GitHub Actions runner doesn't get its own file — it materializes the operator-workstation env inline at job start from TEST_* secrets.
| File | Lives on | Scope | Sourced by |
|---|---|---|---|
scripts/operator-workstation.env |
operator laptop | prod | every helper script + setup-cloud.sh + setup-heima.sh + harness/run.sh |
scripts/operator-workstation.test.env |
operator laptop | test | same scripts, via --env-file <path> |
scripts/broker.env |
prod broker host at /etc/agentkeys/broker.env |
prod | the broker process at boot (also setup-broker-host.sh writes equivalent systemd Environment= lines) |
scripts/broker.test.env |
test broker host at /etc/agentkeys/broker.env |
test | same |
| GitHub Actions runner | ephemeral runner per job | test | harness-ci.yml writes scripts/operator-workstation.env inline from TEST_* secrets (see docs/ci-setup.md §7) |
| Variable | Prod | Test | Purpose |
|---|---|---|---|
ACCOUNT_ID |
429071895007 |
429071895007 (same) |
every cloud step |
REGION |
us-east-1 |
us-east-1 |
regional API calls |
ZONE |
litentry.org |
litentry.org (same) |
parent DNS zone |
PARENT_ZONE_ID |
Route 53 zone ID | same | DNS UPSERTs |
BROKER_HOST |
broker.${ZONE} |
test-broker.${ZONE} |
OIDC issuer hostname (byte-for-byte distinct → distinct IAM OIDC provider ARN) |
MAIL_DOMAIN |
bots.${ZONE} |
bots-test.${ZONE} |
SES inbound subdomain |
BUCKET / MAIL_BUCKET |
agentkeys-mail-${ACCT} |
agentkeys-mail-test-${ACCT} |
inbound mail bucket |
VAULT_BUCKET |
agentkeys-vault-${ACCT} |
agentkeys-vault-test-${ACCT} |
credentials bucket (arch.md §17) |
MEMORY_BUCKET |
agentkeys-memory-${ACCT} |
agentkeys-memory-test-${ACCT} |
memory bucket |
DATA_ROLE_ARN |
…:role/agentkeys-data-role |
…:role/agentkeys-data-role-test |
OIDC-federated data role |
VAULT_ROLE_ARN |
…:role/agentkeys-vault-role |
…:role/agentkeys-vault-role-test |
per-data-class vault role |
MEMORY_ROLE_ARN |
…:role/agentkeys-memory-role |
…:role/agentkeys-memory-role-test |
per-data-class memory role |
OIDC_PROVIDER_ARN |
…:oidc-provider/${BROKER_HOST} |
…:oidc-provider/test-broker.${ZONE} |
derived from BROKER_HOST |
SIGNER_HOST + worker hosts |
signer.${ZONE} etc. |
signer-test.${ZONE} etc. |
per-service public hostnames |
BROKER_EMAIL_FROM_ADDRESS |
noreply@bots.${ZONE} |
noreply-test@bots-test.${ZONE} |
SES verified sender |
Heima contract *_HEIMA addresses |
one set | a DIFFERENT set (same chain, different deployer key) | per-deploy pinned addresses |
| Variable | Prod | Test |
|---|---|---|
ACCOUNT_ID |
same | same |
BROKER_DATA_ROLE_ARN |
…:role/agentkeys-data-role |
…:role/agentkeys-data-role-test |
BROKER_AWS_REGION |
us-east-1 |
us-east-1 |
BROKER_OIDC_ISSUER |
https://broker.${ZONE} |
https://test-broker.${ZONE} |
BROKER_OIDC_KEYPAIR_PATH |
/home/ubuntu/.agentkeys/broker/oidc-keypair.json |
same |
BROKER_SESSION_KEYPAIR_PATH |
/home/ubuntu/.agentkeys/broker/session-keypair.json |
same |
BROKER_AUTH_METHODS |
wallet_sig,email_link |
same |
BROKER_AUDIT_ANCHORS |
sqlite |
same |
BROKER_EMAIL_SENDER |
ses |
ses |
BROKER_EMAIL_FROM_ADDRESS |
noreply@bots.${ZONE} |
noreply-test@bots-test.${ZONE} |
The broker process never reads operator-workstation env vars directly — separation prevents a laptop value from silently shadowing the broker's own config (per scripts/broker.env header comment).
The runner doesn't ship with a checked-in env file. harness-ci.yml writes one inline at job start, mapping TEST_* repo secrets into scripts/operator-workstation.env:
| TEST secret | Maps to operator var |
|---|---|
TEST_ACCOUNT_ID |
ACCOUNT_ID |
TEST_AWS_REGION |
REGION |
TEST_BROKER_HOST |
BROKER_HOST |
TEST_VAULT_BUCKET / TEST_MEMORY_BUCKET |
VAULT_BUCKET / MEMORY_BUCKET |
TEST_DATA_ROLE_ARN / TEST_VAULT_ROLE_ARN / TEST_MEMORY_ROLE_ARN |
DATA_ROLE_ARN / VAULT_ROLE_ARN / MEMORY_ROLE_ARN |
TEST_HEIMA_DEPLOYER_KEY |
written to ~/.agentkeys/heima-deployer.key |
TEST_*_HEIMA contract addresses |
*_HEIMA |
TEST_OIDC_AWS_ROLE_ARN |
the GH Actions OIDC role (gate; not a runtime var) |
Full list + activation flow: docs/ci-setup.md §7. setup-cloud.sh validates required keys at step 2 and dies with a precise pointer if missing.
setup-cloud.sh consumes already-existing identifiers — it does NOT register your domain, create a Route 53 hosted zone, or launch the EC2. Those are operator decisions (instance type, region, key pair, DNS provider choice) and don't belong in an automated script. Three manual prereqs before the orchestrator works:
You own a domain (e.g. litentry.org). If not, register one with any registrar (Namecheap, GoDaddy, Route 53 Domains, etc.) — fully manual, out of scope here.
Create a Route 53 hosted zone for the domain (idempotent at the caller-reference level, but safe to skip if the zone already exists):
aws route53 create-hosted-zone \
--name "$ZONE" \
--caller-reference "agentkeys-$(date +%s)"Look up the zone ID (strip the /hostedzone/ prefix):
aws route53 list-hosted-zones \
--query 'HostedZones[?Name==`'"$ZONE"'.`].Id' --output text \
| awk -F/ '{print $NF}'
# → Z09723983CFJOHAE3VC65Paste it into operator-workstation.env as PARENT_ZONE_ID=Z….
Delegation: Route 53 outputs 4 NS records when you create the zone (visible via aws route53 get-hosted-zone --id $PARENT_ZONE_ID --query 'DelegationSet.NameServers'). Copy them into your registrar's DNS settings as the authoritative nameservers. Verify after propagation (usually <1h):
dig +short NS "$ZONE"
# Should return 4 ns-XX.awsdns-YY.{com,net,org,co.uk} entries.If dig returns the registrar's default nameservers instead, delegation hasn't propagated. All downstream DNS UPSERTs in §6 will silently miss until it does.
Non-Route 53 DNS providers: setup-cloud.sh step 6 hardcodes Route 53 API calls. To use Cloudflare / DigitalOcean / etc., skip step 6 (--to-step 5) and replicate the same 12 records manually — see §6 below for the canonical record set. Test isolation works identically: a test-broker.${ZONE} A record under any DNS provider is the same byte-for-byte trust scope as under Route 53.
setup-broker-host.sh runs on any Linux box with sudo, systemd, public-internet egress, ports 22/80/443 open inbound. The host is your choice:
| Setting | Prod | Test |
|---|---|---|
| Instance type | t3.small minimum | t3.micro is fine |
| AMI | Ubuntu 22.04 LTS or Amazon Linux 2023 | same |
| Security group | 22 (SSH), 80 (certbot HTTP-01), 443 (broker + workers TLS), all from 0.0.0.0/0 |
same (AWS validates OIDC JWKS over public TLS from AWS IPs that aren't pinnable) |
| Key pair | SSH key, EC2 Instance Connect, or SSM Session Manager | same |
Launch via AWS console, aws ec2 run-instances, or your IaC tool. The script doesn't care which.
Getting the IP — three workflows:
Both INSTANCE_ID and EIP live in the env file (scripts/operator-workstation.env or …test.env) — set them there once, not on the shell every run. The test stack is selected by --env-file <path> + the explicit --test flag (or auto-detected when the env-file name contains "test").
Workflow 0 (you already have EC2 + EIP attached): step 4 adopts the existing EIP
If the EC2 is already running with an EIP attached (whether allocated via the AWS Console, Terraform, or a previous setup-cloud.sh run), there's no need to allocate or re-associate. Step 4's precedence ladder detects it:
# 1. Find the existing EC2's instance id:
aws ec2 describe-instances --region "$REGION" \
--filters "Name=ip-address,Values=<YOUR-EXISTING-EIP>" \
--query 'Reservations[].Instances[].InstanceId' --output text
# 2. Paste it into the env file (one line edit):
echo 'INSTANCE_ID=i-0123…' >> scripts/operator-workstation.env
# 3. Run setup-cloud.sh — step 4 prints:
# "skip EIP <ip> already attached to <instance-id> (adopting; no allocation)"
# "ok tagged existing EIP as agentkeys-broker-eip (idempotency for re-runs)"
# No new EIP is allocated. No re-association. The existing EIP gets
# retroactively tagged so future re-runs find it via tag-lookup too.
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --yesThe precedence inside step 4 is: A adopt EIP attached to $INSTANCE_ID → B reuse tagged EIP → C use $EIP from env file → D allocate fresh. First match wins; no later branch fires if an earlier one resolves. Fully idempotent re-runs even when the operator pre-provisioned EC2 + EIP outside the script.
Workflow A (recommended): EC2-first, then attach via env-file edit + re-run
# 1. Launch EC2 → note INSTANCE_ID
aws ec2 run-instances --instance-type t3.small --image-id <ami> --key-name <key> ...
# 2. Paste INSTANCE_ID into the env file (one line edit):
echo 'INSTANCE_ID=<from-step-1>' >> scripts/operator-workstation.env
# (or for test: scripts/operator-workstation.test.env)
# 3. Bootstrap (allocates EIP + attaches to INSTANCE_ID + persists EIP back to env)
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --yes
# Test stack:
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh \
--env-file scripts/operator-workstation.test.env --test --yes
# 4. SSH (EIP is now in the env file as EIP=…)
ssh ubuntu@$(grep ^EIP= scripts/operator-workstation.env | cut -d= -f2)Workflow B: EIP-first, attach manually
# 1. Allocate EIP (printed at §14 summary; persisted to env file as EIP=…)
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --yes
# 2. Launch EC2
aws ec2 run-instances ...
# 3. Attach the EIP
aws ec2 associate-address --region "$REGION" \
--instance-id <new-instance-id> \
--public-ip $(grep ^EIP= scripts/operator-workstation.env | cut -d= -f2)A is one fewer command; B is sometimes necessary when an existing EC2 needs to be repointed at the EIP later. For test, swap in --env-file scripts/operator-workstation.test.env --test everywhere — the EIP will be tagged agentkeys-broker-eip-test (the test env file has the test placeholders pre-populated).
Once the EC2 is launched + the EIP attached, SSH access goes through scripts/ssh-broker.sh — single entry point that reads INSTANCE_ID + EIP from scripts/broker.env or scripts/broker.test.env so it stays in lockstep with whatever setup-cloud.sh persisted.
# Prod broker via EC2 Instance Connect (no .pem needed):
bash scripts/ssh-broker.sh
# Test broker:
bash scripts/ssh-broker.sh test
# Fallback via .pem key (when EC2 Instance Connect is down):
bash scripts/ssh-broker.sh prod --fallback
bash scripts/ssh-broker.sh test --fallbackDefault AWS profiles per stack (least-privilege, one-shot to provision):
| Stack | Default profile | Trust |
|---|---|---|
prod |
agentkeys-broker |
ec2-instance-connect:SendSSHPublicKey on the prod instance ARN only |
test |
agentkeys-broker-test |
same, scoped to the test instance ARN |
If agentkeys-broker or agentkeys-broker-test doesn't exist yet, setup-cloud.sh step 12 creates it idempotently (scoped to whatever INSTANCE_ID is set in the corresponding broker env file):
# Test stack — creates agentkeys-broker-test, scopes ec2-instance-connect
# to INSTANCE_ID from broker.test.env, mints an access key ONCE if none
# active. Re-run is a no-op once the user + policy + key already exist.
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh \
--env-file scripts/operator-workstation.test.env --test --only-step 12
# Prod stack (the canonical `agentkeys-broker` user from CLAUDE.md):
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --only-step 12The script prints the access key once (paste into ~/.aws/credentials as [agentkeys-broker] / [agentkeys-broker-test]) — it never re-mints on subsequent runs because the operator already holds the secret. If INSTANCE_ID is unset in the broker env file, step 12 skips with a pointer to paste it first.
Shell wrappers (drop in ~/.zshrc) make the common case one keystroke:
AGENTKEYS_REPO="$HOME/Projects/agentKeys"
alias ssh-prod='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh prod'
alias ssh-test='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test'A long-lived IAM user with IAMFullAccess + AmazonS3FullAccess + AmazonSESFullAccess + AmazonRoute53FullAccess permissions. Already provisioned per CLAUDE.md "AWS local-profile ↔ remote-IAM mapping". Switch to it before any bootstrap call:
awsp agentkeys-admin
aws sts get-caller-identity # → arn:aws:iam::…:user/agentkeys-adminThe bootstrap script intentionally doesn't auto-create the admin user — bootstrapping IAM root credentials onto disk is the kind of thing you only do once, by hand, with the IAM Console open.
Same AWS account is fine — isolation comes from the -test suffix on every identifier, not from the account boundary. Cross-trust is structurally impossible because the trust policy on every test role lists ONLY the test OIDC provider ARN (which is bound byte-for-byte to test-broker.${ZONE}, never broker.${ZONE}).
| Resource | Prod name | Test name | Created by |
|---|---|---|---|
| IAM user (daemon) | agentkeys-daemon |
agentkeys-daemon-test |
setup-cloud.sh step 10 (suffixed when --test flag is passed, or env-file path matches *test* as an ergonomic auto-detect) |
| IAM role (data) | agentkeys-data-role |
agentkeys-data-role-test |
setup-cloud.sh step 11 (same suffix logic) |
| IAM role (vault) | agentkeys-vault-role |
agentkeys-vault-role-test |
provision-vault-role.sh reads VAULT_ROLE_ARN from the active env file |
| IAM role (memory) | agentkeys-memory-role |
agentkeys-memory-role-test |
provision-memory-role.sh (same env-driven pattern) |
| IAM OIDC provider | …oidc-provider/broker.${ZONE} |
…oidc-provider/test-broker.${ZONE} |
manual aws iam create-open-id-connect-provider per §9.2 (one per broker URL — AWS validates byte-for-byte) |
| EC2 instance profile | agentkeys-broker-host |
agentkeys-broker-host-test |
§6 (optional) |
| EIP (tag) | agentkeys-broker-eip |
agentkeys-broker-eip-test |
setup-cloud.sh step 4 |
| Mail bucket | agentkeys-mail-${ACCT} |
agentkeys-mail-test-${ACCT} |
setup-cloud.sh step 7 (from BUCKET env var) |
| Vault bucket | agentkeys-vault-${ACCT} |
agentkeys-vault-test-${ACCT} |
provision-vault-bucket.sh (from VAULT_BUCKET env var) |
| Memory bucket | agentkeys-memory-${ACCT} |
agentkeys-memory-test-${ACCT} |
provision-memory-bucket.sh (from MEMORY_BUCKET env var) |
| SES sender | noreply@bots.${ZONE} |
noreply-test@bots-test.${ZONE} |
ses-verify-sender.sh (from BROKER_EMAIL_FROM_ADDRESS) |
| Heima contracts | one set of 6 addresses | a different set of 6 (same chain, different deployer key) | setup-heima.sh per deployer key |
Cross-trust isolation enforced by:
- OIDC provider URL is the trust scope. Each role's trust policy names exactly one provider ARN. The provider ARN derives from the broker URL.
broker.${ZONE}andtest-broker.${ZONE}produce distinct ARNs, so the test OIDC provider literally cannot mint JWTs that prod roles accept. - PrincipalTag scoping (§9.4) layers on top. Even if a test JWT somehow reached a prod role, the bucket policy condition
s3:prefix=bots/${aws:PrincipalTag/agentkeys_actor_omni}/*would still scope reads/writes by actor. - Per-data-class bucket separation. Vault role's IAM grants reference vault bucket only; memory role references memory bucket only. Even within one stack, vault creds in the memory bucket → AccessDenied (defense-in-depth for the cap-mint layer).
setup-cloud.sh validates required env keys at step 2 and dies with a precise pointer if missing.
Why
jq -n --argand notcat > file.json <<EOF:jq --argpasses values outside shell parameter expansion, sidestepping the zsh modifier bug ($VAR:retc.) that silently corrupts ARNs. JSON is validated on construction, command substitution feeds straight into--policy-document, no file lands on disk. The orchestrator + every helper script applies this convention.
Cloud-agnostic. The four principals exist in every cloud the broker runs on; the cloud changes only which API creates them.
| Identity | Type | Holds | Purpose |
|---|---|---|---|
agentkeys-admin |
privileged user | Long-lived access key | One-shot provisioning. Runs every command in this doc. IAM-admin scope. |
agentkeys-broker |
scoped user | Long-lived access key | Operator's SSH-into-EC2 path via EC2 Instance Connect (AWS) / SSH key (other clouds). No data-plane access. |
agentkeys-daemon |
runtime user | Long-lived access key | The broker process uses this at runtime. Only permission: assume the data role. |
agentkeys-data-role |
assumed role | (none — assumed) | Holds the actual storage + email permissions. Trusted by the runtime user (Stage 6) or by the OIDC provider (Stage 7). |
agentkeys-broker-host |
instance profile (optional) | (none — bound to a VM) | If the broker runs on a managed VM, attach this so the daemon never sees a static key. Runtime creds come from IMDS / metadata server. |
Why "data role" and not "agent role": the project word "agent" already means three things (the AI agent, the AgentKeys product, an IAM role). The role holds data-plane permissions. The broker still accepts the legacy
BROKER_AGENT_ROLE_ARNenv var for backwards compatibility.
Six subdomains under the operator's parent zone (substitute ${ZONE} everywhere):
| Host | Purpose | Provisioned in |
|---|---|---|
${MAIL_DOMAIN} (e.g. bots.${ZONE}) |
SES / email backend inbound | §3 |
${BROKER_HOST} (e.g. broker.${ZONE}) |
Broker public reverse proxy | §10.1 below |
signer.${ZONE} |
Signer service (issue #74 step 1b) | §10.1 below |
audit.${ZONE} / email.${ZONE} / cred.${ZONE} / memory.${ZONE} |
Service workers (issue #90) | §10.1 below (dev co-location on broker EIP today) |
Confirm the parent zone is reachable before any record changes (AWS Route 53 example; the same get-hosted-zone shape exists on AliCloud DNS + Cloud DNS):
aws route53 get-hosted-zone --id "$PARENT_ZONE_ID" \
--query 'HostedZone.{name:Name, private:Config.PrivateZone}'
# → {"name": "${ZONE}.", "private": false}The bulk service-worker A-record creation is automated by scripts/dns-upsert-workers.sh (AWS Route 53 today). For other providers, replicate the same shape — the hostnames are the migration seam.
aws sesv2 create-email-identity \
--region "$REGION" --email-identity "$MAIL_DOMAIN" \
--dkim-signing-attributes NextSigningKeyLength=RSA_2048_BITThen publish DKIM + SPF + DMARC + MX records in one DNS change. AWS Route 53:
read -r T1 T2 T3 <<<"$(aws sesv2 get-email-identity --region "$REGION" \
--email-identity "$MAIL_DOMAIN" --query 'DkimAttributes.Tokens' --output text)"
aws route53 change-resource-record-sets --hosted-zone-id "$PARENT_ZONE_ID" \
--change-batch "$(jq -n \
--arg domain "$MAIL_DOMAIN" --arg region "$REGION" \
--arg t1 "$T1" --arg t2 "$T2" --arg t3 "$T3" '{
Comment: "AgentKeys email infra for \($domain)",
Changes: [
{Action:"UPSERT", ResourceRecordSet:{Name:"\($t1)._domainkey.\($domain)", Type:"CNAME", TTL:300, ResourceRecords:[{Value:"\($t1).dkim.amazonses.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:"\($t2)._domainkey.\($domain)", Type:"CNAME", TTL:300, ResourceRecords:[{Value:"\($t2).dkim.amazonses.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:"\($t3)._domainkey.\($domain)", Type:"CNAME", TTL:300, ResourceRecords:[{Value:"\($t3).dkim.amazonses.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:$domain, Type:"MX", TTL:300, ResourceRecords:[{Value:"10 inbound-smtp.\($region).amazonaws.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:$domain, Type:"TXT", TTL:300, ResourceRecords:[{Value:"\"v=spf1 include:amazonses.com -all\""}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:"_dmarc.\($domain)", Type:"TXT", TTL:300, ResourceRecords:[{Value:"\"v=DMARC1; p=quarantine; rua=mailto:dmarc@\($domain)\""}]}}
]
}')"Wait ~5 min for DKIM propagation, then verify:
aws sesv2 get-email-identity --region "$REGION" --email-identity "$MAIL_DOMAIN" \
--query '{verified: VerifiedForSendingStatus, dkim: DkimAttributes.Status}'
# → {"verified": true, "dkim": "SUCCESS"}DKIM key custody: in this interim setup, the email service holds the private DKIM key (AWS-internal on SES, AliCloud-internal on DirectMail, etc.). Trust surface = provider could forge mail signed as us → bounded blast radius (reputation, not user-data custody). Migration target is TEE-held BYODKIM — track in
docs/spec/heima-gaps-vs-desired-architecture.md§4. Do not intermediate-step to "BYODKIM with file-stored key" (strictly worse than provider-managed).
aws s3api create-bucket \
--region "$REGION" --bucket "$BUCKET" \
$([ "$REGION" != "us-east-1" ] && echo "--create-bucket-configuration LocationConstraint=$REGION")
aws s3api put-public-access-block --region "$REGION" --bucket "$BUCKET" \
--public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
# 30-day TTL on inbound objects (throwaway-inbox model)
aws s3api put-bucket-lifecycle-configuration --region "$REGION" --bucket "$BUCKET" \
--lifecycle-configuration "$(jq -n '{
Rules: [{ID:"inbound-30d-ttl", Status:"Enabled", Filter:{Prefix:"inbound/"}, Expiration:{Days:30}}]
}')"aws ses create-receipt-rule-set --rule-set-name agentkeys --region "$REGION" 2>/dev/null || true
aws ses create-receipt-rule --region "$REGION" --rule-set-name agentkeys \
--rule "$(jq -n --arg domain "$MAIL_DOMAIN" --arg bucket "$BUCKET" '{
Name: "agentkeys-inbound", Enabled: true, ScanEnabled: true, TlsPolicy: "Optional",
Recipients: [$domain],
Actions: [{S3Action: {BucketName: $bucket, ObjectKeyPrefix: "inbound/"}}]
}')"
aws ses set-active-receipt-rule-set --rule-set-name agentkeys --region "$REGION"Inbound MIME lands at s3://$BUCKET/inbound/<msg_id>. First object: AMAZON_SES_SETUP_NOTIFICATION (provider's "I successfully wrote to your bucket" marker). Real mail follows.
Sandbox vs production sending: inbound is unaffected by SES sandbox; outbound to arbitrary addresses needs Console → Support → "SES Sending Limits" → "Request Production Access".
aws iam create-user --user-name agentkeys-daemon
aws iam create-access-key --user-name agentkeys-daemon
# → save AccessKeyId + SecretAccessKey to your secret manager. NEVER to git.
aws iam put-user-policy --user-name agentkeys-daemon \
--policy-name agentkeys-daemon-assume-role \
--policy-document "$(jq -n --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[{
Effect:"Allow", Action:"sts:AssumeRole",
Resource:"arn:aws:iam::\($acct):role/agentkeys-data-role"
}]
}')"The daemon user can do exactly one thing: assume agentkeys-data-role. Any storage / email action goes through the role's permissions, never the user's.
The role's trust policy starts with the static-IAM-user variant. After the broker is publicly reachable, docs/cloud-bootstrap.md §4 swaps it for the OIDC-federated variant.
aws iam create-role --role-name agentkeys-data-role \
--assume-role-policy-document "$(jq -n --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[{
Effect:"Allow",
Principal:{AWS:"arn:aws:iam::\($acct):user/agentkeys-daemon"},
Action:"sts:AssumeRole"
}]
}')"
aws iam put-role-policy --role-name agentkeys-data-role \
--policy-name agentkeys-data-role-inline \
--policy-document "$(jq -n \
--arg bucket "$BUCKET" --arg region "$REGION" \
--arg acct "$ACCOUNT_ID" --arg domain "$MAIL_DOMAIN" '{
Version:"2012-10-17",
Statement:[
{Effect:"Allow", Action:"s3:ListBucket", Resource:"arn:aws:s3:::\($bucket)"},
{Effect:"Allow", Action:"s3:GetObject", Resource:"arn:aws:s3:::\($bucket)/*"},
{Effect:"Allow", Action:["ses:SendEmail","ses:GetEmailIdentity"],
Resource:["arn:aws:ses:\($region):\($acct):identity/\($domain)",
"arn:aws:ses:\($region):\($acct):identity/*@\($domain)"]}
]
}')"
export ROLE_ARN=$(aws iam get-role --role-name agentkeys-data-role --query 'Role.Arn' --output text)
echo "ROLE_ARN=$ROLE_ARN"Per arch.md §17.2: separate roles for credentials + memory data classes. Same trust shape as §4.2, distinct inline policies + PrincipalTag scoping. Provisioned by per-data-class helpers (idempotent):
bash scripts/provision-vault-bucket.sh # agentkeys-vault-${ACCOUNT_ID}
bash scripts/provision-vault-role.sh # agentkeys-vault-role
bash scripts/apply-vault-bucket-policy.sh # v3 split-statement PrincipalTag policy
bash scripts/provision-memory-bucket.sh
bash scripts/provision-memory-role.sh
bash scripts/apply-memory-bucket-policy.sh
bash scripts/cleanup-mail-bucket-policy.sh # restore email-only grants on $BUCKETThese scripts are the source of truth for the policy shape — read them, don't transcribe.
If you reached this section, agentkeys-admin exists (you're using it). agentkeys-broker is whatever IAM user you SSH into the broker host with — its perms are out of scope (ec2-instance-connect:SendSSHPublicKey on the host's instance ID is sufficient for AWS Instance Connect).
aws s3api put-bucket-policy --region "$REGION" --bucket "$BUCKET" \
--policy "$(jq -n --arg bucket "$BUCKET" --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[
{
Sid:"AllowSESWriteInbound", Effect:"Allow",
Principal:{Service:"ses.amazonaws.com"},
Action:"s3:PutObject",
Resource:"arn:aws:s3:::\($bucket)/*",
Condition:{StringEquals:{"aws:Referer":$acct}}
},
{
Sid:"AllowDaemonRead", Effect:"Allow",
Principal:{AWS:"arn:aws:iam::\($acct):role/agentkeys-data-role"},
Action:["s3:GetObject","s3:ListBucket"],
Resource:["arn:aws:s3:::\($bucket)","arn:aws:s3:::\($bucket)/*"]
}
]
}')"The PrincipalTag-scoped federated variant (which replaces this once OIDC federation is up) lives in docs/cloud-bootstrap.md §4.4.
If the broker runs on AWS EC2, attach this so the daemon never holds a static key. Runtime creds come from IMDS.
ROLE=agentkeys-broker-host
aws iam create-role --role-name "$ROLE" \
--assume-role-policy-document "$(jq -n '{
Version:"2012-10-17",
Statement:[{Effect:"Allow", Principal:{Service:"ec2.amazonaws.com"}, Action:"sts:AssumeRole"}]
}')"
aws iam put-role-policy --role-name "$ROLE" --policy-name BrokerAssumeData \
--policy-document "$(jq -n --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[{Effect:"Allow", Action:"sts:AssumeRole",
Resource:"arn:aws:iam::\($acct):role/agentkeys-data-role"}]
}')"
aws iam create-instance-profile --instance-profile-name "$ROLE"
aws iam add-role-to-instance-profile --instance-profile-name "$ROLE" --role-name "$ROLE"
aws ec2 associate-iam-instance-profile --region "$REGION" \
--instance-id "$INSTANCE_ID" \
--iam-instance-profile Name="$ROLE"Caller-region trap:
agentkeys-adminprofile defaults tous-west-2; the broker EC2 usually lives inus-east-1. Without--region "$REGION",describe-instancessilently returns empty and downstreamput-role-policyruns with--role-name "". Pass--regionexplicitly on every regional call. See CLAUDE.md "AWS local-profile ↔ remote-IAM mapping".
The broker calls SES v2 SendEmail with its own runtime credentials (instance profile), not via the assumed agentkeys-data-role. Without ses:SendEmail on the broker's role, the operator hits:
broker rejected /v1/auth/email/request: status=502 body=
{"error":"backend_unreachable","message":"… ses SendEmail:
unhandled error (AccessDeniedException)"}
The IAM action is ses:SendEmail (sesv2), NOT ses:SendRawEmail (v1; different code path the broker doesn't use). The grant lives on the broker's runtime role (agentkeys-broker-host on EC2; the user agentkeys-daemon otherwise) — see docs/cloud-bootstrap.md §3.3 for the exact statement.
Some early deploys ship with AmazonS3FullAccess (or similar wide permissions) attached to the broker's runtime role. The broker at runtime ONLY uses aws-sdk-sts (the GetCallerIdentity startup probe) + aws-sdk-sesv2 (the §6.1 grant) — it never accesses S3 with its own creds. Per-user S3 is via JWT-assumed agentkeys-{data,vault,memory}-role, not the broker's runtime role.
A broker compromise with AmazonS3FullAccess would expose every inbound email in the SES bucket (verification tokens, magic links). Strip it:
# Discover the actual role attached to the broker host (canonical name:
# agentkeys-broker-host; some early deploys landed on different names):
INSTANCE_PROFILE_ARN=$(aws ec2 describe-instances --region "$REGION" \
--filters "Name=ip-address,Values=$EIP" \
--query 'Reservations[].Instances[].IamInstanceProfile.Arn' --output text)
ROLE=$(aws iam get-instance-profile \
--instance-profile-name "${INSTANCE_PROFILE_ARN##*/}" \
--query 'InstanceProfile.Roles[0].RoleName' --output text)
echo "broker runtime role: $ROLE"
# Audit attached policies:
aws iam list-attached-role-policies --role-name "$ROLE"
# Detach AmazonS3FullAccess if present:
aws iam detach-role-policy --role-name "$ROLE" \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
# Verify only the narrow inline policy (BrokerSendEmail + AssumeDataRole) remains:
aws iam list-role-policies --role-name "$ROLE"
aws iam list-attached-role-policies --role-name "$ROLE"Every layer in §3–§5 has a 1:1 analog on the major providers. The provisioning shape carries; only the API endpoints + JSON dialects differ.
| Layer | AWS (current) | AliCloud (in progress) | GCP | Tencent Cloud |
|---|---|---|---|---|
| Privileged user | IAM user with IAMFullAccess |
RAM user with AliyunRAMFullAccess |
IAM service account with roles/iam.securityAdmin |
CAM user with AdministratorAccess |
| Runtime user | IAM user + access key | RAM user + AK/SK | Service account + key file (or Workload Identity) | CAM user + SecretId/SecretKey |
| Data role | IAM role + assume policy | RAM role + assume policy | Service account + IAM bindings | CAM role + assume policy |
| Federation | IAM OIDC provider | RAM IDaaS / OIDC provider | Workload Identity Pool | CAM OIDC provider |
| Object store | S3 + bucket policy | OSS + bucket policy | Cloud Storage + IAM bindings | COS + bucket policy |
| Email backend | SES + S3 receipt rule | DirectMail / SimpleDM + OSS event notification | SendGrid / Mailgun (no GCP-native) | SimpleDM + COS |
| TLS termination | nginx + Let's Encrypt | nginx + Let's Encrypt | nginx + Let's Encrypt | nginx + Let's Encrypt |
| Compute (broker host) | EC2 + EIP | ECS + EIP | Compute Engine + external IP | CVM + EIP |
| DNS | Route 53 | AliCloud DNS | Cloud DNS | DNSPod / Cloud DNS |
| Secrets storage | Secrets Manager / SSM Parameter Store | KMS Secrets Manager | Secret Manager | KMS |
Migration playbook (cloud → cloud):
- Re-bind operator-workstation.env to the new provider's identifiers (account ID, region, role ARNs, bucket name).
- Re-run this doc top-to-bottom against the new provider.
- Re-run §9 (OIDC federation activation) — substitute the provider's OIDC API.
- Re-run
scripts/setup-broker-host.shon the new host (the script doesn't care which cloud — it consumes already-provisioned identifiers). - Re-run
scripts/setup-heima.sh— the chain side is cloud-agnostic. - Re-run the harness scripts to validate end-to-end.
The boundary is sharp: the broker process itself contains zero cloud-specific code — it talks STS-compatible OIDC + S3-compatible PutObject/GetObject + SMTP-compatible SendEmail. Every cloud above offers all three primitives. The provisioner-scripts/email-backends/ directory documents the email-backend trait; a new backend slots in as tencent-simpledm-cos (or similar) with the same upstream API as ses-s3.
The broker mints OIDC JWTs that AWS STS validates via the broker's public JWKS endpoint. Three one-shot steps per account, run AFTER setup-broker-host.sh finishes and the broker is reachable at https://${BROKER_HOST} over public TLS.
https://${BROKER_HOST}/.well-known/openid-configurationreturns 200 with the expectedissuer+jwks_uri.https://${BROKER_HOST}/.well-known/jwks.jsonreturns at least one ES256 key.curl -sf "https://${BROKER_HOST}/healthz"returns 200.
# DoH-resolved EIP (immune to local DNS interception; see §5b verify steps):
broker_ip=$(curl -sS "https://dns.google/resolve?name=${BROKER_HOST}&type=A" | jq -r '.Answer[0].data')
# -sha1 is REQUIRED. macOS LibreSSL 3.3 + OpenSSL 3.x default to SHA256
# (64 hex chars) but AWS IAM CreateOpenIDConnectProvider rejects anything
# that isn't exactly 40 hex chars (SHA1).
thumb=$(echo | openssl s_client -servername "$BROKER_HOST" \
-connect "${broker_ip}:443" 2>/dev/null \
| openssl x509 -fingerprint -sha1 -noout \
| awk -F'=' '{print $2}' | tr -d ':' | tr 'A-Z' 'a-z')
[ ${#thumb} -eq 40 ] || { echo "thumb length ${#thumb} != 40 — check -sha1 flag" >&2; return 1; }
aws iam create-open-id-connect-provider \
--url "https://${BROKER_HOST}" \
--client-id-list "sts.amazonaws.com" \
--thumbprint-list "$thumb"AWS validates the issuer URL byte-for-byte against the JWT iss claim. Once registered, the URL is effectively immutable — switching means a new provider ARN + new trust policy + new federated grants.
Apply to each of the three data roles. Use $ROLE ∈ {agentkeys-data-role, agentkeys-vault-role, agentkeys-memory-role} (or the -test variants when bootstrapping the CI test instance).
aws iam update-assume-role-policy --role-name "$ROLE" --policy-document "$(jq -n \
--arg acct "$ACCOUNT_ID" --arg host "$BROKER_HOST" '{
Version:"2012-10-17",
Statement:[{
Effect:"Allow",
Principal:{Federated:"arn:aws:iam::\($acct):oidc-provider/\($host)"},
Action:"sts:AssumeRoleWithWebIdentity",
Condition:{StringEquals:{"\($host):aud":"sts.amazonaws.com"}}
}]
}')"Per CLAUDE.md "Per-actor + per-data-class isolation invariants": every S3 read/write is scoped to bots/${aws:PrincipalTag/agentkeys_actor_omni}/{credentials,memory}/*. The split-statement v3 bucket policy is applied by scripts/apply-{vault,memory}-bucket-policy.sh — those scripts are the source of truth for the policy shape.
After §9.3 + §9.4, strip the broad-bucket inline grant from the role's policy (the bucket-side policy enforces; defense in depth means no app-side grant):
aws iam delete-role-policy --role-name "$ROLE" --policy-name "${ROLE}-inline"Run harness/v2-stage3-demo.sh (or bash harness/run.sh --stage 3) — it mints session JWT → OIDC JWT → STS creds, then proves both POSITIVE (own prefix) and NEGATIVE (cross-actor prefix → AccessDenied) writes for both data classes plus the cross-role isolation matrix. Walks the full §17.2 isolation table from CLAUDE.md.
§§3–8 set up identifiers. This step stands up the actual processes — broker + mock-server + signer + 4 service workers — on the EC2 host (or any Linux box with public-internet egress + the broker's hostname).
- Fresh Linux host with sudo, systemd, public-internet egress, ports 80 + 443 open inbound (for certbot + nginx).
- DNS A records for
${BROKER_HOST}+signer.${ZONE}+audit.${ZONE}+email.${ZONE}+cred.${ZONE}+memory.${ZONE}all pointing at the host's public IP (provisioned bysetup-cloud.shstep 6). - AWS credentials in
/etc/agentkeys/broker.env(the script writes the template; operator pastes theagentkeys-daemonaccess key from §4.1).
# Bootstrap a fresh host:
sudo bash scripts/setup-broker-host.sh \
--issuer-url "https://${BROKER_HOST}" \
--account-id "${ACCOUNT_ID}" \
--signer-host "signer.${ZONE}" \
--audit-host "audit.${ZONE}" \
--email-host "email.${ZONE}" \
--cred-host "cred.${ZONE}" \
--memory-host "memory.${ZONE}" \
--yes
# After a `git pull`, the same command re-deploys:
sudo bash scripts/setup-broker-host.sh --yesThe script:
- Builds
agentkeys-broker-server(+auth-email-linkfeature),agentkeys-mock-server, the 4 service workers, and the signer. - Creates the
agentkeyssystem user + state dir/var/lib/agentkeys/. - Writes the dev_key_service master secret (one-shot at first boot, never rotated — rotation invalidates every previously-derived wallet).
- Writes per-worker env files at
/etc/agentkeys/worker-{audit,email,creds,memory}.env. - Writes systemd units for broker + signer + each worker, enables + starts.
- Configures nginx vhosts for
${BROKER_HOST}+signer.${ZONE}+ 4 worker hosts (skip via--without-nginx). Vhost is rendered in two phases: Phase A (HTTP-only on:80, with the ACME challenge path under/.well-known/acme-challenge/and a 503 placeholder on/) when no cert is on disk; Phase B (HTTPS on:443, broker proxy on/) when/etc/letsencrypt/live/<host>/fullchain.pemexists. Re-running the script after certbot issuance flips A → B automatically. - Installs certbot but does NOT run it. Cert issuance is DNS-dependent — see quick-start §5b for the per-vhost
certbot certonly --webrootrecipe operators run manually once DNS is in place. - Mints broker keypairs (oidc + session) under
/var/lib/agentkeys/keys/.
Auto-detects bootstrap vs upgrade by reading the existing systemd unit's Environment= lines. Pass --ref <branch> to opt into an in-script git fetch + pull.
curl -sf "https://${BROKER_HOST}/healthz" # → 200
curl -sf "https://${BROKER_HOST}/.well-known/openid-configuration" | jq .
curl -sf "https://${BROKER_HOST}/.well-known/jwks.json" | jq '.keys | length'
curl -sf "https://audit.${ZONE}/healthz" # → 200 (and friends)For full E2E (broker + workers + chain + AWS), run bash harness/run.sh — see docs/chain-setup.md for the chain side and docs/ci-setup.md for the automated path.
Tear down the whole AgentKeys footprint in one account. Use only when retiring the deployment.
# Drain the buckets
for b in "$BUCKET" "agentkeys-vault-${ACCOUNT_ID}" "agentkeys-memory-${ACCOUNT_ID}"; do
aws s3 rm "s3://$b" --recursive 2>/dev/null || true
aws s3api delete-bucket --bucket "$b" --region "$REGION" 2>/dev/null || true
done
# Roles
for r in agentkeys-data-role agentkeys-vault-role agentkeys-memory-role agentkeys-broker-host; do
for p in $(aws iam list-role-policies --role-name "$r" --query 'PolicyNames[]' --output text 2>/dev/null); do
aws iam delete-role-policy --role-name "$r" --policy-name "$p"
done
aws iam delete-role --role-name "$r" 2>/dev/null || true
done
# OIDC provider
aws iam delete-open-id-connect-provider \
--open-id-connect-provider-arn "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${BROKER_HOST}"
# Daemon user
for k in $(aws iam list-access-keys --user-name agentkeys-daemon --query 'AccessKeyMetadata[].AccessKeyId' --output text); do
aws iam delete-access-key --user-name agentkeys-daemon --access-key-id "$k"
done
aws iam delete-user-policy --user-name agentkeys-daemon --policy-name agentkeys-daemon-assume-role 2>/dev/null || true
aws iam delete-user --user-name agentkeys-daemon
# SES + DNS
aws ses set-active-receipt-rule-set --rule-set-name "" --region "$REGION" 2>/dev/null || true
aws sesv2 delete-email-identity --email-identity "$MAIL_DOMAIN" --region "$REGION" 2>/dev/null || true
# DNS records: operator-managed (Route 53 / your DNS provider) — delete by hand.
# EC2 + EIP: manual via console or aws ec2 CLIFor the test instance, substitute -test on every identifier above.
- Operator workstation setup:
docs/dev-setup.md - Chain bring-up:
docs/chain-setup.md - CI activation:
docs/ci-setup.md - Broker host script (single entry point):
scripts/setup-broker-host.sh - Cloud bootstrap script (single entry point):
scripts/setup-cloud.sh - Architecture (per-data-class buckets + isolation invariants):
docs/spec/architecture.md§17, §17.2 - Future Tencent / TEE DKIM:
docs/spec/heima-gaps-vs-desired-architecture.md§4 - FAQ + troubleshooting:
wiki/cloud-setup-faq.md