Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
25b344d
substrate wip
peterj May 23, 2026
76cae6c
fix up the optional/non-optional types in the crd/values file
peterj May 26, 2026
85b7b56
clean up ui/gateway stuff (use base path from openclaw)
peterj May 26, 2026
ebcaf63
split substrate and openshell, use the secrets in substrate fork for …
peterj May 27, 2026
25bf681
move startup script to a template
peterj May 27, 2026
5c470ce
pr feedback
peterj May 27, 2026
a687988
Merge branch 'main' into peterj/substrate
EItanya May 29, 2026
6b682d5
make linter happy
peterj May 29, 2026
768a115
move write ops to writer-role
peterj May 29, 2026
cb9a31b
commenting out the substrate section in values
peterj May 29, 2026
57eae49
Merge branch 'main' into peterj/substrate
peterj May 29, 2026
6cf6d58
fix failing helm unit tests
peterj Jun 1, 2026
40f6209
fix remaining pr feedback
peterj Jun 1, 2026
a161e7e
go mod tidy
peterj Jun 1, 2026
520cd9b
substrate wip
peterj May 23, 2026
8fcd1f0
fix up the optional/non-optional types in the crd/values file
peterj May 26, 2026
69087df
clean up ui/gateway stuff (use base path from openclaw)
peterj May 26, 2026
fd338bd
split substrate and openshell, use the secrets in substrate fork for …
peterj May 27, 2026
959972a
move startup script to a template
peterj May 27, 2026
80ddbaf
pr feedback
peterj May 27, 2026
198505b
make linter happy
peterj May 29, 2026
a23e899
move write ops to writer-role
peterj May 29, 2026
459a04d
commenting out the substrate section in values
peterj May 29, 2026
c5ccb68
fix failing helm unit tests
peterj Jun 1, 2026
6a82057
Refine substrate AgentHarness lifecycle
EItanya Jun 1, 2026
efc122d
Simplify substrate actor cleanup wiring
EItanya Jun 1, 2026
9664936
Split AgentHarness controllers by runtime
EItanya Jun 1, 2026
4d379f1
Fix AgentHarness CI failures
EItanya Jun 1, 2026
fa30b78
fix minor issues
peterj Jun 1, 2026
a9dae07
Merge branch 'peterj/substrate' of github.com:kagent-dev/kagent into …
peterj Jun 1, 2026
600d717
update image and rbac for secrets
peterj Jun 1, 2026
8c90bd4
fix the readme
peterj Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
- name: Install agent-sandbox
run: |
kubectl apply -f "https://github.com/kubernetes-sigs/agent-sandbox/releases/download/${AGENT_SANDBOX_VERSION}/manifest.yaml"
kubectl wait --for=condition=Established crd/sandboxes.agents.x-k8s.io --timeout=90s
timeout 90s bash -c 'until [ "$(kubectl get crd sandboxes.agents.x-k8s.io -o jsonpath="{.status.conditions[?(@.type==\"Established\")].status}" 2>/dev/null)" = "True" ]; do sleep 1; done'
kubectl rollout status deployment/agent-sandbox-controller -n agent-sandbox-system --timeout=120s
kubectl wait --for=condition=Ready pod -l app=agent-sandbox-controller -n agent-sandbox-system --timeout=120s
Expand Down
60 changes: 60 additions & 0 deletions docs/substrate-agentharness-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Substrate AgentHarness Lifecycle

This branch should use a single ownership model for `runtime: substrate` harnesses.

## Ownership

- Platform/Helm owns `WorkerPool` capacity.
- kagent owns the generated per-harness `ActorTemplate`.
- kagent owns the per-harness actor lifecycle through `ate-api`.
- Substrate owns the `WorkerPool` deployment and the `ActorTemplate` golden snapshot process.

kagent should not create or delete `WorkerPool` resources from the `AgentHarness` reconciler. A chart may optionally install a default `WorkerPool`, and the controller may use that default when `spec.substrate.workerPoolRef` is unset.

## Spec Shape

`AgentHarness.spec.substrate` should contain only harness-level inputs:

- `workerPoolRef`, optional; falls back to the configured controller default.
- `snapshotsConfig`, optional; defaults to `gs://ate-snapshots/<namespace>/<name>`.
- `workloadImage`, optional.
- exactly one of `gatewayToken` or `gatewayTokenSecretRef`.

There is no `actorTemplateRef`. kagent always generates the `ActorTemplate`, so adopting an external template is not part of the workflow.

## Status

Use top-level Kubernetes conditions for progress:

- `Accepted`
- `ActorTemplateReady`
- `ActorReady`
- `Ready`

`Ready` is the aggregate condition. Specific blockers should be reflected in `reason` and `message`.

Do not store ownership booleans or cleanup markers in annotations or status. Ownership is deterministic:

- `WorkerPool` is external.
- generated `ActorTemplate` is owned by the `AgentHarness` through an owner reference.

## Reconcile

The substrate reconcile path should:

1. Resolve `workerPoolRef` from spec or controller default.
2. Verify the `WorkerPool` exists.
3. Create or update the generated `ActorTemplate` with an owner reference to the `AgentHarness`.
4. Wait for `ActorTemplate.status.phase == Ready`.
5. Create or resume the actor through `ate-api`.
6. Mark `ActorReady` and aggregate `Ready`.

## Delete

The finalizer should:

1. Delete the harness actor recorded in `status.backendRef.id`.
2. Read the generated `ActorTemplate` and delete `status.goldenActorID`, if present.
3. Remove the finalizer.

Kubernetes garbage collection deletes the generated `ActorTemplate` through the owner reference. kagent does not delete `WorkerPool`.
142 changes: 142 additions & 0 deletions examples/substrate-openclaw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# OpenClaw on Agent Substrate

## 1. Install Substrate on your Kind cluster

You can clone the kagent fork of substrate [here](https://github.com/kagent-dev/substrate).

These instructions use a Kind cluster called `kind` (`KIND_CLUSTER_NAME=kind`).

```bash
cd substrate

./hack/create-kind-cluster.sh
./hack/install-ate-kind.sh --deploy-ate-system
```

`--deploy-ate-system` installs the **control plane only** (ate-api, ate-controller, atelet, atenet, …). Your registry catalog will show `ateapi-*`, `atelet-*`, etc., but **not** ateom until you build it.

Build and push **ateom-gvisor** (required for the WorkerPool `ateomImage`):

```bash
# build the ateom-gvisor image from the substrate repo root
export KO_DOCKER_REPO=localhost:5001
export KO_DEFAULTPLATFORMS=linux/$(go env GOARCH)
./hack/run-tool.sh ko build -B ./cmd/ateom-gvisor
```

## kagent AgentHarness with substrate runtime

kagent generates a per-harness `ActorTemplate` and uses an existing `WorkerPool`.

Install kagent (Substrate must already be running in the cluster):

```bash
export KIND_CLUSTER_NAME=kind
make helm-install KAGENT_HELM_EXTRA_ARGS="\
--set controller.substrate.enabled=true \
--set controller.substrate.ateApiEndpoint=dns:///api.ate-system.svc:443 \
--set controller.substrate.ateApiInsecure=true \
--set substrateWorkerPool.create=true \
--set substrateWorkerPool.ateomImage=localhost:5001/ateom-gvisor:latest"
```

The generated `ActorTemplate` uses `controller.substrate.pauseImage`, `controller.substrate.runscAMD64URL`, `controller.substrate.runscAMD64SHA256`, `controller.substrate.runscARM64URL`, and `controller.substrate.runscARM64SHA256` from the Helm values Override them with `--set` or a values file when you need to pin a different gVisor build.

Create a harness. If `snapshotsConfig` is omitted, kagent defaults it to `gs://ate-snapshots/<namespace>/<agentharnessname>`.

- **Worker pool** — reference an existing pool (`workerPoolRef`) or configure a controller default WorkerPool
- **Gateway token** — required per harness with either `gatewayToken` or `gatewayTokenSecretRef`

```yaml
apiVersion: kagent.dev/v1alpha2
kind: AgentHarness
metadata:
name: peterj-claw
namespace: kagent
spec:
runtime: substrate
backend: openclaw
description: OpenClaw on Agent Substrate
modelConfigRef: default-model-config
substrate:
# Optional: defaults to gs://ate-snapshots/kagent/peterj-claw
# snapshotsConfig:
# location: gs://ate-snapshots/kagent/peterj-claw

# Required unless the controller has a default WorkerPool configured.
workerPoolRef:
name: kagent-default

# Required: configure the OpenClaw gateway token for this harness.
# Use either gatewayToken or gatewayTokenSecretRef. The Secret must contain key "token".
gatewayToken: test-token

# gatewayTokenSecretRef:
# name: openclaw-gateway-token

# Optional: override the sandbox image used in the ActorTemplate (must be digest-pinned).
# workloadImage: ghcr.io/kagent-dev/nemoclaw/sandbox-base@sha256:d52bee415dc4c0dba7164f9eabe727574c056d4f211781f20af249707883a3b4
```

kagent creates an `ActorTemplate` that looks roughly like this:

```yaml
apiVersion: ate.dev/v1alpha1
kind: ActorTemplate
metadata:
name: peterj-claw
namespace: kagent
labels:
app.kubernetes.io/managed-by: kagent
kagent.dev/agent-harness: peterj-claw
spec:
pauseImage: gcr.io/gke-release/pause@sha256:bcbd57ba5653580ec647b16d8163cdd1112df3609129b01f912a8032e48265da
runsc:
amd64:
url: gs://gvisor/releases/nightly/2026-05-19/x86_64/runsc
sha256Hash: a397be1abc2420d26bce6c70e6e2ff96c73aaaab929756c56f5e2089ea842b63
arm64:
url: gs://gvisor/releases/nightly/2026-05-19/aarch64/runsc
sha256Hash: 1ba2366ae2efceba166046f51a4104f9261c9cb72c6db8f5b3fe2dc57dea86b9
workerPoolRef:
name: peterj-claw-wp
namespace: kagent
snapshotsConfig:
location: gs://ate-snapshots/kagent/peterj-claw
containers:
- name: openclaw
image: ghcr.io/kagent-dev/nemoclaw/sandbox-base@sha256:d52bee415dc4c0dba7164f9eabe727574c056d4f211781f20af249707883a3b4
ports:
- containerPort: 80
command:
- /bin/sh
- -c
- |
# Generated by kagent:
# 1. writes ~/.openclaw/openclaw.json from modelConfigRef/channels/gateway token
# 2. configures gateway.controlUi.basePath for the kagent proxy path
# 3. starts `openclaw gateway run --port 80 --allow-unconfigured`
# 4. waits for the gateway and tails the log
env:
- name: HOME
value: /root
```

The generated `command` contains a base64-encoded `openclaw.json`, so the live object will be more verbose than the abbreviated example above. `pauseImage`, runsc URLs and hashes, and the default workload image come from controller/Helm configuration unless overridden on the `AgentHarness`; the gateway token comes from `spec.substrate.gatewayToken` or `gatewayTokenSecretRef`. kagent also sets `gateway.controlUi.basePath` to `/api/agentharnesses/<namespace>/<name>/gateway` so OpenClaw serves the Control UI under the same path kagent proxies.

When `modelConfigRef` or `spec.channels` are set, credentials are **not** copied into the ActorTemplate or `openclaw.json` as plaintext. kagent writes `valueFrom.secretKeyRef` (or inline `value` for harness inline tokens) on the ActorTemplate container env; Substrate `ate-api` resolves those refs at actor resume. In `openclaw.json`, kagent uses OpenClaw [env SecretRefs](https://docs.openclaw.ai/gateway/secrets) (`{source:"env",provider:"default",id:"<VAR>"}`) for `models.providers.*.apiKey`, `channels.telegram.accounts.*.botToken`, and `channels.slack.accounts.*.botToken` / `appToken`. Rotate a Secret and recreate the ActorTemplate golden snapshot when keys change.

With `controller.substrate.enabled=true`, the kagent Helm chart installs a namespace-scoped Role and RoleBinding so `ate-api-server` (in `ate-system` by default) can `get` Secrets and ConfigMaps referenced by generated ActorTemplates. Harnesses in other namespaces need that namespace listed in `rbac.namespaces` (or a matching RoleBinding applied manually).

Port-forward the UI:

```bash
kubectl port-forward -n kagent svc/kagent-ui 8001:8080
```

Navigate to the deployed agent harness. If the OpenClaw Control UI asks for a gateway connection, use:

- Gateway URL: `http://localhost:8001/api/agentharnesses/kagent/peterj-claw/gateway/`
- Gateway token: `test-token`

The gateway URL must include the trailing slash. The token is the value configured in `spec.substrate.gatewayToken`, or the Secret value referenced by `spec.substrate.gatewayTokenSecretRef`; enter it in the token/credentials field rather than relying on a `token` query parameter.
76 changes: 76 additions & 0 deletions go/api/config/crd/bases/kagent.dev_agentharnesses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ spec:
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .spec.runtime
name: Runtime
type: string
- jsonPath: .spec.backend
name: Backend
type: string
Expand Down Expand Up @@ -511,6 +514,75 @@ spec:
type: string
type: array
type: object
runtime:
default: openshell
description: Runtime selects the harness provisioning stack. Defaults
to openshell when unset.
enum:
- openshell
- substrate
type: string
substrate:
description: Substrate is required when runtime is substrate.
properties:
gatewayToken:
description: |-
GatewayToken is the OpenClaw gateway Bearer token for this harness.
Prefer gatewayTokenSecretRef for production secrets.
minLength: 1
type: string
gatewayTokenSecretRef:
description: |-
GatewayTokenSecretRef references a Secret key holding the OpenClaw gateway Bearer token.
The Secret must contain a "token" key.
properties:
apiGroup:
type: string
kind:
type: string
name:
type: string
required:
- name
type: object
snapshotsConfig:
description: |-
SnapshotsConfig configures actor memory snapshots. Defaults to
gs://ate-snapshots/<namespace>/<agentharnessname> when unset.
properties:
location:
description: |-
Location is the GCS URI prefix for golden and incremental snapshots.
Example: gs://ate-snapshots/kagent/my-namespace/my-harness/
pattern: ^gs://
type: string
required:
- location
type: object
workerPoolRef:
description: |-
WorkerPoolRef references an existing ate.dev WorkerPool in the harness namespace.
When unset, the controller uses its configured default WorkerPool.
properties:
apiGroup:
type: string
kind:
type: string
name:
type: string
required:
- name
type: object
workloadImage:
description: WorkloadImage overrides the default nemoclaw/openclaw
sandbox image in the ActorTemplate.
type: string
type: object
x-kubernetes-validations:
- message: Exactly one of gatewayToken or gatewayTokenSecretRef must
be specified
rule: (has(self.gatewayToken) && !has(self.gatewayTokenSecretRef))
|| (!has(self.gatewayToken) && has(self.gatewayTokenSecretRef))
required:
- backend
type: object
Expand All @@ -520,6 +592,10 @@ spec:
|| (has(c.slack) && ((self.backend == ''hermes'' && has(c.slack.hermes)
&& !has(c.slack.openclaw)) || ((self.backend == ''openclaw'' || self.backend
== ''nemoclaw'') && has(c.slack.openclaw) && !has(c.slack.hermes)))))'
- message: spec.substrate may only be set when runtime is substrate
rule: '!has(self.substrate) || self.runtime == ''substrate'''
- message: spec.substrate is required when runtime is substrate
rule: self.runtime != 'substrate' || has(self.substrate)
status:
description: AgentHarnessStatus is the observed state of an AgentHarness.
properties:
Expand Down
12 changes: 12 additions & 0 deletions go/api/httpapi/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,17 @@ type OpenshellAgentHarnessListEntry struct {
Endpoint string `json:"endpoint,omitempty"`
}

// SubstrateAgentHarnessListEntry is set when runtime is substrate.
type SubstrateAgentHarnessListEntry struct {
Backend v1alpha2.AgentHarnessBackendType `json:"backend"`
Runtime v1alpha2.AgentHarnessRuntime `json:"runtime"`
ActorID string `json:"actorId,omitempty"`
GatewayUIPath string `json:"gatewayUIPath,omitempty"`
ModelConfigRef string `json:"modelConfigRef,omitempty"`
BackendRefID string `json:"backendRefId,omitempty"`
Endpoint string `json:"endpoint,omitempty"`
}

type AgentResponse struct {
ID string `json:"id"`
Agent *AgentResource `json:"agent"`
Expand All @@ -157,6 +168,7 @@ type AgentResponse struct {
Accepted bool `json:"accepted"`
WorkloadMode v1alpha2.WorkloadMode `json:"workloadMode,omitempty"`
OpenshellAgentHarness *OpenshellAgentHarnessListEntry `json:"openshellAgentHarness,omitempty"`
SubstrateAgentHarness *SubstrateAgentHarnessListEntry `json:"substrateAgentHarness,omitempty"`
}

// Session types
Expand Down
Loading
Loading