Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
e81a821
chore(deps): update dependency kubernetes-sigs/gateway-api to v1.5.1
github-actions[bot] May 24, 2026
db4f33a
chore(deps): update obolup.sh dependency updates
github-actions[bot] May 25, 2026
9a84f30
chore(deps): update cloudflare/cloudflared docker tag to v2026.5.0
github-actions[bot] May 25, 2026
c108b19
fix(monetize): drop redundant available field, use drainEndsAt as sol…
bussyjd May 25, 2026
ae25b41
docs: add obol stack release train playbook
bussyjd May 25, 2026
0f5eea8
feat(x402): add last-payment-success-seconds gauge + recording rule f…
bussyjd May 25, 2026
1ad607d
Merge remote-tracking branch 'refs/remotes/pr/544' into smoke/prs-544…
bussyjd May 25, 2026
75700da
Merge remote-tracking branch 'refs/remotes/pr/546' into smoke/prs-544…
bussyjd May 25, 2026
c93e2c9
Merge remote-tracking branch 'refs/remotes/pr/547' into smoke/prs-544…
bussyjd May 25, 2026
b2dcb27
Merge remote-tracking branch 'refs/remotes/pr/548' into smoke/prs-544…
bussyjd May 25, 2026
65fd3b2
Merge remote-tracking branch 'refs/remotes/pr/549' into smoke/prs-544…
bussyjd May 25, 2026
51aa86d
Merge remote-tracking branch 'refs/remotes/pr/550' into smoke/prs-544…
bussyjd May 25, 2026
69ca96b
fix: stabilize post-544 smoke train
bussyjd May 25, 2026
d86a10e
fix: bump LiteLLM fork image
bussyjd May 25, 2026
a3a4073
fix: raise LiteLLM memory limit
bussyjd May 25, 2026
061e356
fix: persist no-thinking option for custom LLM routes
bussyjd May 25, 2026
28838d0
fix: make tool-call smoke deterministic
bussyjd May 25, 2026
df96d9f
fix: stabilize OBOL Permit2 smoke
bussyjd May 25, 2026
6b95e77
Update cmd/obol/model.go
OisinKyne May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .agents/skills/obol-stack-dev/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name: obol-stack-dev
description: Obol Stack development and QA runbook. Use when working on obol-stack flows, x402 seller/buyer tests, live Base Sepolia OBOL smoke, Anvil fork regressions, ERC-8004 registration, LiteLLM paid routing, release-smoke, cloudflared, Renovate image bumps, or remote QA worktrees.
metadata:
version: "3.0.0"
version: "3.1.0"
domain: infrastructure
role: specialist
scope: development-and-testing
Expand All @@ -17,6 +17,7 @@ Operational router. Load only the reference for the task. **Do not delegate unde
| Need | Read |
|---|---|
| Local build, env vars, force-rebuild, CLI surface | `references/dev.md` |
| PR trains, ordered merge/collapse, release candidate gate | `references/release-train.md` |
| Release-smoke broken — what to check first | `references/release-smoke-debugging.md` |
| Live OBOL smoke, flow choice, Bob derivation, success criteria | `references/paid-flows.md` |
| LiteLLM model setup, paid/* route, port-forward | `references/llm-routing.md` |
Expand Down
156 changes: 156 additions & 0 deletions .agents/skills/obol-stack-dev/references/release-train.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# PR And Release Train

Use this when asked to review or merge a set of obol-stack PRs, pin a frontend RC, handle GHAS/Renovate comments, or cut a release candidate. This is the orchestration layer; load the other references for the specific smoke, LLM, paid-flow, or remote-QA details.

## Inputs To Nail Down

- PR range and exclusions, for example "all PRs greater than #509 except #542".
- Target base branch and whether the work should merge existing PRs, collapse them, or open fix PRs.
- Release tag, frontend image tag, and whether the release is draft, prerelease, or ready.
- Validation target: local unit tests, running cluster upgrade, live OBOL smoke, fork smoke, or full `flows/release-smoke.sh`.
- Any required OpenAI-compatible QA LLM endpoint and model. Keep endpoint details in the shell environment or private notes, not in skill files, commit messages, PR text, or release text.

## Train Shape

```mermaid
flowchart LR
A["Inventory PRs and checks"] --> B["Architectural review"]
B --> C{"Incorrect or risky?"}
C -- "yes" --> D["Open fix/<topic> PR"]
C -- "no" --> E["Mark ready / merge in order"]
D --> F["Parallel targeted validation"]
F --> E
E --> G["Upgrade running cluster"]
G --> H["Release-smoke gate"]
H --> I{"Green enough to release?"}
I -- "no" --> J["Record blockers, do not claim green"]
I -- "yes" --> K["Template-based non-draft RC release"]
```

## Inventory

Start with source-of-truth state, not memory:

```bash
gh pr list --state open --limit 100 --json number,title,headRefName,baseRefName,isDraft,mergeStateStatus,statusCheckRollup,updatedAt
gh pr view <number> --json number,title,body,headRefName,baseRefName,isDraft,mergeStateStatus,statusCheckRollup,reviewDecision,commits,files,comments,reviews
```

Build a table with number, topic, branch, draft status, checks, review status, dependency order, and whether it changes runtime behavior, release artifacts, CI, chart manifests, or docs only.

## Architectural Review

For each PR, review the diff in dependency order. The decision is not "does it compile"; it is whether the change preserves the stack contracts:

- No regression in public/private route boundaries. Frontend, eRPC, storefront, `/.well-known/agent-registration.json`, and `/skill.md` stay intentionally exposed; agent internals do not become public.
- No loss of x402 semantics: `PurchaseRequest Ready=True`, paid HTTP 200, exact balance deltas, on-chain transfer, and buyer route hot-add remain required evidence.
- No dev/prod image confusion. Under `OBOL_DEVELOPMENT=true`, running pods must use the local images intended by the branch.
- No release-only migration or wrapper unless the repo already has a durable helper. Prefer release notes warnings and operator directions when the product is not yet production-released.
- No narrowing of supported chain names, model endpoint forms, or URL forms unless the caller and tests prove the old form is dead.
- No broad cleanup. Delete only clusters, worktrees, containers, or ports whose ownership is recorded by the current worktree or explicitly confirmed.

Subagents are useful for sidecar trace work, but the main agent owns the final architectural judgement. Give subagents bounded questions such as "trace all callers of this field" or "verify this PR cannot expose a private route"; do not delegate the whole train.

## Fix PRs

When a PR is architecturally wrong, open a minimal fix branch:

```bash
git switch -c fix/<short-topic>
```

PR descriptions should be self-contained and should not mention Codex or local host details. Include:

- What invariant was violated.
- Why the fix is the smallest correct change.
- A Mermaid diagram when the behavior crosses controllers, charts, tunnels, buyers, or releases.
- Exact validation run and result.
- Remaining risk or follow-up, if any.

Diagram template:

```mermaid
sequenceDiagram
participant User
participant CLI as obol CLI
participant K8s as Kubernetes
participant Controller
participant Service as Runtime service
User->>CLI: command / upgrade / smoke
CLI->>K8s: apply intended manifests
K8s->>Controller: reconcile desired state
Controller->>Service: publish route or config
Service-->>User: validated behavior
```

## GHAS, Renovate, And Image Pins

Treat bot comments as review input, not noise:

- Read the exact comment and affected line before changing anything.
- For GitHub Actions and third-party images, prefer current versions pinned by immutable SHA or digest when the repo pattern expects it.
- Check whether Renovate has a matching manager/rule for frontend RC images and digest updates. If it failed to open a bump, fix the rule and validate it with the narrowest available Renovate config check.
- For frontend RCs, verify both the repo pin and the running pod image/digest after cluster upgrade.
- Do not mark the train done until PR checks and security comments are either fixed or explicitly documented as non-actionable with evidence.

## Merge And Collapse Order

Merge from the oldest/base dependency forward. After each merge or collapse step:

```bash
git fetch origin
git log --oneline --decorate --graph --max-count=30 origin/main
gh pr view <number> --json state,mergedAt,mergeCommit,isDraft,mergeStateStatus,statusCheckRollup
```

Before merging the next PR, confirm the previous behavior did not regress:

- Branch head contains the expected commits and did not drop earlier fixes.
- Required CI checks are complete or the reason for bypass is recorded.
- Any running-cluster upgrade still points at the expected backend and frontend images.
- Release notes and PR descriptions still match the final merged code, not an earlier draft.

## Release Candidate Gate

A release candidate is not ready just because the GitHub release exists. Gate it in this order:

1. Start the body from `.github/release-template.md`.
2. Keep generated `What's Changed`, `New Contributors`, and `Full Changelog` at the bottom.
3. Include warnings and operator directions for known upgrade issues only after validating the upgrade path or explicitly labeling the warning as unverified.
4. Run the smoke set required by the release. For full RCs, use `flows/release-smoke.sh` with live and fork flags when credentials and RPC capacity are available.
5. Fill the release body with the actual smoke report: command, artifact path, pass/fail table, failed flow names, and current blockers.
6. Only make the RC non-draft when the release body and validation evidence are complete.

If any smoke flow fails, say exactly what failed. Do not present a release as green when the report is red or partially blocked.

## Running-Cluster Upgrade Check

Before testing an upgrade against a live local cluster:

```bash
k3d cluster list
kubectl get pods -A
kubectl get deploy -A -o wide
```

Identify the active stack ID, frontend image, backend component images, ports, and any parallel obol-stack clusters. Use tmux for long-running commands or shared sudo prompts. Clean up only stale stacks that are not the target and whose ownership is clear.

After the upgrade:

```bash
kubectl get deploy -A -o wide
kubectl get pods -A
```

Then run the targeted flow or full release smoke. Archive the log and artifact directory path in the PR or release description.

## Final Report

End with a short, auditable status:

- PRs reviewed, fixed, merged, skipped, or left blocked.
- Bot comments resolved or remaining.
- Image pins and Renovate rules checked.
- Smoke command, report path, and pass/fail summary.
- Release URL and draft/prerelease status.
- Cleanup performed and any cluster/worktree intentionally left running.
2 changes: 1 addition & 1 deletion .github/workflows/helm-template-smoke.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- name: Set up Helm
uses: azure/setup-helm@dda3372f752e03dde6b3237bc9431cdc2f7a02a2 # v5.0.0
with:
version: v3.20.1 # match obolup.sh pinned version
version: v3.21.0 # match obolup.sh pinned version

- name: helm template ./base
run: |
Expand Down
6 changes: 5 additions & 1 deletion cmd/obol/model.go
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,7 @@ func modelSetupCustomCommand(cfg *config.Config) *cli.Command {
&cli.StringFlag{Name: "endpoint", Usage: "Full base URL (e.g. http://host:8000/v1)", Required: true},
&cli.StringFlag{Name: "model", Usage: "Model identifier at the endpoint — this is also the LiteLLM model_name the agent will call", Required: true},
&cli.StringFlag{Name: "api-key", Usage: "API key (optional, some endpoints don't require it)"},
&cli.BoolFlag{Name: "disable-thinking", Usage: "Tells a model not to use its thinking mode to reason about turns for longer."},
&cli.BoolFlag{Name: "no-sync", Usage: "Skip the agent model sync (batch with other model commands, then run `obol model sync` once)"},
},
Action: func(ctx context.Context, cmd *cli.Command) error {
Expand All @@ -275,7 +276,10 @@ func modelSetupCustomCommand(cfg *config.Config) *cli.Command {
modelName := cmd.String("model")
apiKey := cmd.String("api-key")

if err := model.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey); err != nil {
options := model.CustomEndpointOptions{
DisableThinking: cmd.Bool("disable-thinking"),
}
if err := model.AddCustomEndpointWithOptions(cfg, u, endpoint, modelName, apiKey, options); err != nil {
return err
}

Expand Down
8 changes: 6 additions & 2 deletions flows/flow-01-prerequisites.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ run_step "Docker daemon running" docker info
# LLM endpoint must be serving. Full QA uses an OpenAI-compatible
# vLLM/llama.cpp endpoint; local development can still use Ollama.
if [ -n "${OBOL_LLM_ENDPOINT:-}" ]; then
run_step_grep "OpenAI-compatible LLM endpoint serving models" "data|id" \
curl -sf "${OBOL_LLM_ENDPOINT%/}/models"
step "OpenAI-compatible LLM endpoint returns final chat content"
if preflight_openai_llm_endpoint; then
pass "LLM endpoint usable for model ${OBOL_LLM_MODEL:-qwen36-deep}"
else
fail "LLM endpoint did not pass OpenAI-compatible chat preflight"
fi
else
run_step_grep "Ollama serving models" "models" curl -sf http://localhost:11434/api/tags
fi
Expand Down
50 changes: 45 additions & 5 deletions flows/flow-03-inference.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,22 +60,62 @@ else
fi

# §3d: Tool-call passthrough
tool_call_name() {
python3 -c '
import json
import sys

try:
data = json.load(sys.stdin)
except Exception:
sys.exit(1)

choices = data.get("choices") or []
if not choices:
sys.exit(1)

message = choices[0].get("message") or {}
for call in message.get("tool_calls") or []:
function = call.get("function") or {}
if function.get("name") == "get_weather":
print("get_weather")
sys.exit(0)

sys.exit(1)
'
}

step "Tool-call passthrough"
tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model":"'"$LITELLM_MODEL"'",
"messages":[{"role":"user","content":"What is the weather in London?"}],
"messages":[{"role":"user","content":"Call the get_weather tool for London. Do not answer in text."}],
"tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
"max_tokens":100,"stream":false
"tool_choice":{"type":"function","function":{"name":"get_weather"}},
"temperature":0,"max_tokens":100,"stream":false
}' 2>&1) || true

if echo "$tool_out" | grep -q "tool_calls\|get_weather"; then
if echo "$tool_out" | tool_call_name >/dev/null 2>&1; then
pass "Tool-call passthrough works"
else
# Small/local models may not reliably support tool calls — soft fail
fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
# Some OpenAI-compatible endpoints accept tools but reject forced tool_choice.
tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model":"'"$LITELLM_MODEL"'",
"messages":[{"role":"user","content":"Call the get_weather tool with location London. Do not answer in text."}],
"tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
"temperature":0,"max_tokens":100,"stream":false
}' 2>&1) || true

if echo "$tool_out" | tool_call_name >/dev/null 2>&1; then
pass "Tool-call passthrough works"
else
fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
fi
fi

cleanup_pid "$PF_PID"
Expand Down
5 changes: 3 additions & 2 deletions flows/flow-04-agent.sh
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,11 @@ if [ -n "${OBOL_LLM_ENDPOINT:-}" ] && [ "$model_name" != "${OBOL_LLM_MODEL:-qwen
exit 0
fi

llm_payload_suffix="$(llm_disable_thinking_payload_suffix)"
out=$(curl -sf --max-time 120 -X POST "http://localhost:${AGENT_PF_PORT}/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false}" 2>&1) || true
-d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false${llm_payload_suffix}}" 2>&1) || true

if echo "$out" | grep -q "choices"; then
pass "Agent inference returned response"
Expand All @@ -138,7 +139,7 @@ step "Agent answers 'hello' without parroting tool catalogue (model rank regress
hello_out=$(curl -sf --max-time 120 -X POST "http://localhost:${AGENT_PF_PORT}/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}],\"max_tokens\":150,\"stream\":false}" 2>&1) || true
-d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}],\"max_tokens\":150,\"stream\":false${llm_payload_suffix}}" 2>&1) || true
hello_content=$(echo "$hello_out" | python3 -c "
import json, sys
try:
Expand Down
12 changes: 7 additions & 5 deletions flows/flow-06-sell-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,16 @@ else
fail "CRD API group/version unexpected: group=$crd_group, version=$crd_version"
fi
run_step_grep "x402 verifier running" "Running" "$OBOL" kubectl get pods -n x402 --no-headers
# x402-verifier has 2 replicas for high availability (CLAUDE.md: "2 replicas")
step "x402-verifier has 2 replicas (high availability)"
# The embedded x402 manifest intentionally runs one verifier replica in local
# stacks. Keep the smoke assertion aligned with the shipped manifest; HA belongs
# to production sizing, not the single-node release-smoke cluster.
step "x402-verifier has 1 replica (local stack sizing)"
verifier_replicas=$("$OBOL" kubectl get deployment x402-verifier -n x402 \
-o jsonpath='{.spec.replicas}' 2>&1) || true
if [ "$verifier_replicas" = "2" ]; then
pass "x402-verifier: 2 replicas (HA payment gate)"
if [ "$verifier_replicas" = "1" ]; then
pass "x402-verifier: 1 replica (local payment gate)"
else
fail "x402-verifier replica count: $verifier_replicas (expected 2)"
fail "x402-verifier replica count: $verifier_replicas (expected 1)"
fi
# x402-verifier service must be on port 8080 (matches ForwardAuth address :8080/verify)
step "x402-verifier service on port 8080"
Expand Down
18 changes: 13 additions & 5 deletions flows/flow-11-dual-stack.sh
Original file line number Diff line number Diff line change
Expand Up @@ -608,6 +608,8 @@ except Exception as e:
wait_for_paid_inference() {
local attempts="${1:-24}"
local delay="${2:-5}"
local transient_retries="${PAID_INFERENCE_TRANSIENT_RETRIES:-1}"
local transient_seen=0
local out=""
local i

Expand All @@ -617,9 +619,14 @@ wait_for_paid_inference() {
printf '%s\n' "$out"
return 0
fi
if echo "$out" | grep -q "Payment verification failed" || \
echo "$out" | grep -q "ERROR=503" || \
echo "$out" | grep -q "ServiceUnavailableError"; then
if echo "$out" | paid_inference_pending_error; then
sleep "$delay"
continue
fi
if echo "$out" | paid_inference_transient_error && [ "$transient_seen" -lt "$transient_retries" ]; then
transient_seen=$((transient_seen + 1))
echo "RETRY_TRANSIENT=${transient_seen}/${transient_retries}: paid inference hit transient timeout/error" >&2
printf '%s\n' "$out" >&2
sleep "$delay"
continue
fi
Expand Down Expand Up @@ -1271,6 +1278,7 @@ else
fi

step "Bob's agent: discover Alice via ERC-8004 registry"
llm_payload_suffix="$(llm_disable_thinking_payload_suffix)"
discover_response=$(curl -sf --max-time 300 \
-X POST "http://localhost:${BOB_AGENT_PORT}/v1/chat/completions" \
-H "Authorization: Bearer $BOB_TOKEN" \
Expand All @@ -1282,7 +1290,7 @@ discover_response=$(curl -sf --max-time 300 \
\"content\": \"Search the ERC-8004 agent identity registry on Base Sepolia for recently registered AI inference services that support x402 payments. Use the discovery skill to scan for agents. Look for one named 'Dual-Stack Test Inference' or similar with natural_language_processing skills. Report what you find — the agent ID, name, endpoint URL, and whether it supports x402.\"
}],
\"max_tokens\": 4000,
\"stream\": false
\"stream\": false${llm_payload_suffix}
}" 2>&1 || true)

discover_content=$(extract_assistant_content "$discover_response" 2>/dev/null || true)
Expand Down Expand Up @@ -1341,7 +1349,7 @@ else
\"content\": \"Use the buy-x402 skill and your terminal tool. Run exactly once: ERPC_URL=http://erpc.erpc.svc.cluster.local/rpc ERPC_NETWORK=base-sepolia python3 $BOB_OBOL_SKILLS_DIR/buy-x402/scripts/buy.py buy alice-inference --endpoint $TUNNEL_URL/services/alice-inference/v1/chat/completions --model $OBOL_LLM_MODEL --count $FLOW11_BUY_COUNT\"
}],
\"max_tokens\": 4000,
\"stream\": false
\"stream\": false${llm_payload_suffix}
}" 2>&1 || true)

buy_content=$(extract_assistant_content "$buy_response" 2>/dev/null || true)
Expand Down
Loading
Loading