Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ curl -X POST -H "Host: my-counter-1.actors.resources.substrate.ate.dev" -i http:

### GKE Quickstart (Development)

> For a declarative, Terraform-based setup that starts from a vanilla Google Cloud project, see [hack/gcp/iac/README.md](hack/gcp/iac/README.md) (GKE Quickstart (Production)).

1. Create and configure your environment file:
```bash
cp hack/ate-dev-env.sh.example .ate-dev-env.sh
Expand Down
81 changes: 81 additions & 0 deletions hack/gcp/ate-dev-env.sh.gcp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Environment variables for Substrate development on GCP when the underlying
# resources are provisioned with Terraform (see hack/iac/).
#
# Copy this file to .ate-dev-env.sh and customize it for your environment.
# The values here must match the variables in hack/iac/terraform.tfvars.
#
# Unlike hack/ate-dev-env.sh.example (which is consumed by the
# `go run ./tools/setup-gcp` provisioner), the resources referenced below are
# created by Terraform. Run `terraform apply` in hack/iac/ first, then source
# this file before deploying with hack/install-ate.sh.

export PROJECT_ID=${USER}-gke-dev
export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)")

export GCE_REGION=us-central1
export CLUSTER_LOCATION=us-central1-c

# VPC and subnet are created by Terraform (hack/iac/network.tf) and are both
# named "substrate".
export NETWORK=substrate
export SUBNETWORK=substrate

export CLUSTER_NAME=substrate-poc
export CLUSTER_VERSION=1.35.0-gke.2398000

# The gVisor sandbox runtime runs on the "worker" node pool created by
# Terraform (hack/iac/cluster.tf). The default pool keeps a single node for
# system and non-gVisor workloads.
export NODE_POOL_NAME=worker
export NODE_POOL_VERSION=1.35.0-gke.2398000
export DEFAULT_NODE_MACHINE_TYPE=e2-standard-2
export GVISOR_NODE_MACHINE_TYPE=c3-standard-4

# Set this if you are using an existing cluster with a different context name.
export KUBECTL_CONTEXT=

export BUCKET_NAME=snapshot-substrate-test-${PROJECT_ID}

# Artifact Registry repository created by Terraform (hack/iac/artifactregistry.tf).
# Cloud Build pushes images here; the cluster pulls from it.
export AR_REPOSITORY_ID=ate-images
export KO_DOCKER_REPO="${GCE_REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPOSITORY_ID}"

# Set this if you want to override the default build platforms
export KO_DEFAULTPLATFORMS=linux/amd64

# ── Terraform inputs ──────────────────────────────────────────────────────────
# Terraform automatically picks up any variable from a TF_VAR_<name> environment
# variable, so we derive its inputs from the values above instead of maintaining
# a separate terraform.tfvars. Source this file before running terraform in
# hack/iac/ and the variables will be populated automatically.
export TF_VAR_project_id=${PROJECT_ID}
export TF_VAR_gce_region=${GCE_REGION}
export TF_VAR_cluster_location=${CLUSTER_LOCATION}
export TF_VAR_cluster_name=${CLUSTER_NAME}
export TF_VAR_cluster_version=${CLUSTER_VERSION}
export TF_VAR_default_node_machine_type=${DEFAULT_NODE_MACHINE_TYPE}
export TF_VAR_worker_node_machine_type=${GVISOR_NODE_MACHINE_TYPE}
export TF_VAR_bucket_name=${BUCKET_NAME}
export TF_VAR_ar_repository_id=${AR_REPOSITORY_ID}
export TF_VAR_filestore=false

# Networking CIDRs use the defaults in hack/iac/variables.tf. Uncomment and set
# these to override them.
# export TF_VAR_subnet_cidr=10.0.0.0/20
# export TF_VAR_pods_cidr=10.1.0.0/16
# export TF_VAR_services_cidr=10.2.0.0/20
7 changes: 7 additions & 0 deletions hack/gcp/iac/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Local .terraform directories
*.terraform/
*.terraform.lock.*

# .tfstate files
*.tfstate
*.tfstate.*
117 changes: 117 additions & 0 deletions hack/gcp/iac/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
### GKE Quickstart (Production)

This is the Terraform equivalent of the `go run ./tools/setup-gcp --all`
provisioner described in the [GKE Quickstart (Development)](../../../README.md)
section. It provisions the same GCP resources — a GKE cluster, snapshot bucket,
Artifact Registry repository, and IAM bindings — but does so declaratively and
starting from a **vanilla Google Cloud project**: no APIs enabled, no VPC, and
no subnets are assumed to exist beforehand.

The configuration lives in [`hack/gcp/iac/`](.) and uses resources from the
[Terraform Google Cloud provider](https://registry.terraform.io/providers/hashicorp/google/latest/docs)
directly (no modules), so every resource is visible and easy to adapt.

What gets created:

- A dedicated `substrate` VPC and subnet (VPC-native, with secondary ranges for pods and services).
- A GKE cluster with Workload Identity and the required Kubernetes beta APIs enabled.
- A single-node default pool (for system and non-gVisor workloads) and a `worker` pool running the gVisor sandbox runtime.
- A GCS bucket for sandbox snapshots.
- An Artifact Registry repository, with Cloud Build granted write access and the cluster granted read access.
- All required Google Cloud APIs.

#### Prerequisites

1. Install [Terraform](https://developer.hashicorp.com/terraform/install) (>= 1.5) and the [`gcloud` CLI](https://cloud.google.com/sdk/docs/install).

2. Create and source your environment file. This single file drives both
Terraform and the deployment scripts: it exports `TF_VAR_*` variables that
Terraform picks up automatically, so there is no separate `terraform.tfvars`
to keep in sync. Source it now, before any of the steps below — everything
that follows relies on the variables it exports:
```bash
cp hack/gcp/ate-dev-env.sh.gcp .ate-dev-env.sh

# Edit .ate-dev-env.sh to match your project and preferences, then source it:
source .ate-dev-env.sh
```

3. Authenticate with application-default credentials:
```bash
gcloud auth application-default login --project=${PROJECT_ID}
```

4. Bootstrap the two APIs that Terraform itself depends on. Although this
configuration enables all required APIs via `google_project_service`, that
resource needs the **Service Usage API** to function, and the
`google_project` data source read during `terraform plan` needs the **Cloud
Resource Manager API**. Neither can be enabled by Terraform on a truly
vanilla project (chicken-and-egg), so enable them once up front with
`gcloud`:
```bash
gcloud services enable \
serviceusage.googleapis.com \
cloudresourcemanager.googleapis.com \
--project=${PROJECT_ID}
```
Terraform manages the remaining APIs from there.

#### Provisioning

1. Initialize Terraform and review the plan. Terraform reads its inputs from the
`TF_VAR_*` variables exported by the environment file you sourced in the
prerequisites, so make sure `.ate-dev-env.sh` is sourced in your current
shell:
```bash
cd hack/gcp/iac
terraform init
terraform plan
```

2. Apply to provision all resources:
```bash
terraform apply
```

3. Configure `kubectl` to talk to the new cluster. Terraform prints the exact
command as the `get_credentials_command` output:
```bash
gcloud container clusters get-credentials ${CLUSTER_NAME} --location ${CLUSTER_LOCATION} --project ${PROJECT_ID}
```

4. Configure Docker authentication for Artifact Registry. The deployment scripts
build and push images locally (via `ko`/Docker) to `KO_DOCKER_REPO`, so the
human or CI principal running the deployment pushes directly — it does **not**
go through Cloud Build. Authenticate your local Docker client against the
registry host:
```bash
gcloud auth configure-docker ${GCE_REGION}-docker.pkg.dev
```
This Terraform configuration only grants `roles/artifactregistry.writer` to
the Cloud Build service account (see [`iam.tf`](iam.tf)). If you deploy with
the local-push path, make sure the principal running it also has Artifact
Registry writer access on the repository, for example:
```bash
gcloud artifacts repositories add-iam-policy-binding ${AR_REPOSITORY_ID} \
--location=${GCE_REGION} \
--project=${PROJECT_ID} \
--member="user:$(gcloud config get-value account)" \
--role="roles/artifactregistry.writer"
```

5. Deploy the Agent Substrate system and demos exactly as in the development
quickstart:
```bash
./hack/install-ate.sh --deploy-ate-system
```

#### Tearing down resources

To delete everything Terraform created:
```bash
cd hack/gcp/iac
terraform destroy
```

The GKE cluster sets `deletion_protection = false`, so `terraform destroy`
removes it without any manual intervention.
Loading