ObolNetwork · OisinKyne · May 25, 2026 · May 24, 2026 · May 25, 2026 · May 25, 2026
diff --git a/.agents/skills/obol-stack-dev/SKILL.md b/.agents/skills/obol-stack-dev/SKILL.md
@@ -2,7 +2,7 @@
 name: obol-stack-dev
 description: Obol Stack development and QA runbook. Use when working on obol-stack flows, x402 seller/buyer tests, live Base Sepolia OBOL smoke, Anvil fork regressions, ERC-8004 registration, LiteLLM paid routing, release-smoke, cloudflared, Renovate image bumps, or remote QA worktrees.
 metadata:
-  version: "3.0.0"
+  version: "3.1.0"
   domain: infrastructure
   role: specialist
   scope: development-and-testing
@@ -17,6 +17,7 @@ Operational router. Load only the reference for the task. **Do not delegate unde
 | Need | Read |
 |---|---|
 | Local build, env vars, force-rebuild, CLI surface | `references/dev.md` |
+| PR trains, ordered merge/collapse, release candidate gate | `references/release-train.md` |
 | Release-smoke broken — what to check first | `references/release-smoke-debugging.md` |
 | Live OBOL smoke, flow choice, Bob derivation, success criteria | `references/paid-flows.md` |
 | LiteLLM model setup, paid/* route, port-forward | `references/llm-routing.md` |

diff --git a/.agents/skills/obol-stack-dev/references/release-train.md b/.agents/skills/obol-stack-dev/references/release-train.md
@@ -0,0 +1,156 @@
+# PR And Release Train
+
+Use this when asked to review or merge a set of obol-stack PRs, pin a frontend RC, handle GHAS/Renovate comments, or cut a release candidate. This is the orchestration layer; load the other references for the specific smoke, LLM, paid-flow, or remote-QA details.
+
+## Inputs To Nail Down
+
+- PR range and exclusions, for example "all PRs greater than #509 except #542".
+- Target base branch and whether the work should merge existing PRs, collapse them, or open fix PRs.
+- Release tag, frontend image tag, and whether the release is draft, prerelease, or ready.
+- Validation target: local unit tests, running cluster upgrade, live OBOL smoke, fork smoke, or full `flows/release-smoke.sh`.
+- Any required OpenAI-compatible QA LLM endpoint and model. Keep endpoint details in the shell environment or private notes, not in skill files, commit messages, PR text, or release text.
+
+## Train Shape
+
+```mermaid
+flowchart LR
+    A["Inventory PRs and checks"] --> B["Architectural review"]
+    B --> C{"Incorrect or risky?"}
+    C -- "yes" --> D["Open fix/<topic> PR"]
+    C -- "no" --> E["Mark ready / merge in order"]
+    D --> F["Parallel targeted validation"]
+    F --> E
+    E --> G["Upgrade running cluster"]
+    G --> H["Release-smoke gate"]
+    H --> I{"Green enough to release?"}
+    I -- "no" --> J["Record blockers, do not claim green"]
+    I -- "yes" --> K["Template-based non-draft RC release"]
+```
+
+## Inventory
+
+Start with source-of-truth state, not memory:
+
+```bash
+gh pr list --state open --limit 100 --json number,title,headRefName,baseRefName,isDraft,mergeStateStatus,statusCheckRollup,updatedAt
+gh pr view <number> --json number,title,body,headRefName,baseRefName,isDraft,mergeStateStatus,statusCheckRollup,reviewDecision,commits,files,comments,reviews
+```
+
+Build a table with number, topic, branch, draft status, checks, review status, dependency order, and whether it changes runtime behavior, release artifacts, CI, chart manifests, or docs only.
+
+## Architectural Review
+
+For each PR, review the diff in dependency order. The decision is not "does it compile"; it is whether the change preserves the stack contracts:
+
+- No regression in public/private route boundaries. Frontend, eRPC, storefront, `/.well-known/agent-registration.json`, and `/skill.md` stay intentionally exposed; agent internals do not become public.
+- No loss of x402 semantics: `PurchaseRequest Ready=True`, paid HTTP 200, exact balance deltas, on-chain transfer, and buyer route hot-add remain required evidence.
+- No dev/prod image confusion. Under `OBOL_DEVELOPMENT=true`, running pods must use the local images intended by the branch.
+- No release-only migration or wrapper unless the repo already has a durable helper. Prefer release notes warnings and operator directions when the product is not yet production-released.
+- No narrowing of supported chain names, model endpoint forms, or URL forms unless the caller and tests prove the old form is dead.
+- No broad cleanup. Delete only clusters, worktrees, containers, or ports whose ownership is recorded by the current worktree or explicitly confirmed.
+
+Subagents are useful for sidecar trace work, but the main agent owns the final architectural judgement. Give subagents bounded questions such as "trace all callers of this field" or "verify this PR cannot expose a private route"; do not delegate the whole train.
+
+## Fix PRs
+
+When a PR is architecturally wrong, open a minimal fix branch:
+
+```bash
+git switch -c fix/<short-topic>
+```
+
+PR descriptions should be self-contained and should not mention Codex or local host details. Include:
+
+- What invariant was violated.
+- Why the fix is the smallest correct change.
+- A Mermaid diagram when the behavior crosses controllers, charts, tunnels, buyers, or releases.
+- Exact validation run and result.
+- Remaining risk or follow-up, if any.
+
+Diagram template:
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant CLI as obol CLI
+    participant K8s as Kubernetes
+    participant Controller
+    participant Service as Runtime service
+    User->>CLI: command / upgrade / smoke
+    CLI->>K8s: apply intended manifests
+    K8s->>Controller: reconcile desired state
+    Controller->>Service: publish route or config
+    Service-->>User: validated behavior
+```
+
+## GHAS, Renovate, And Image Pins
+
+Treat bot comments as review input, not noise:
+
+- Read the exact comment and affected line before changing anything.
+- For GitHub Actions and third-party images, prefer current versions pinned by immutable SHA or digest when the repo pattern expects it.
+- Check whether Renovate has a matching manager/rule for frontend RC images and digest updates. If it failed to open a bump, fix the rule and validate it with the narrowest available Renovate config check.
+- For frontend RCs, verify both the repo pin and the running pod image/digest after cluster upgrade.
+- Do not mark the train done until PR checks and security comments are either fixed or explicitly documented as non-actionable with evidence.
+
+## Merge And Collapse Order
+
+Merge from the oldest/base dependency forward. After each merge or collapse step:
+
+```bash
+git fetch origin
+git log --oneline --decorate --graph --max-count=30 origin/main
+gh pr view <number> --json state,mergedAt,mergeCommit,isDraft,mergeStateStatus,statusCheckRollup
+```
+
+Before merging the next PR, confirm the previous behavior did not regress:
+
+- Branch head contains the expected commits and did not drop earlier fixes.
+- Required CI checks are complete or the reason for bypass is recorded.
+- Any running-cluster upgrade still points at the expected backend and frontend images.
+- Release notes and PR descriptions still match the final merged code, not an earlier draft.
+
+## Release Candidate Gate
+
+A release candidate is not ready just because the GitHub release exists. Gate it in this order:
+
+1. Start the body from `.github/release-template.md`.
+2. Keep generated `What's Changed`, `New Contributors`, and `Full Changelog` at the bottom.
+3. Include warnings and operator directions for known upgrade issues only after validating the upgrade path or explicitly labeling the warning as unverified.
+4. Run the smoke set required by the release. For full RCs, use `flows/release-smoke.sh` with live and fork flags when credentials and RPC capacity are available.
+5. Fill the release body with the actual smoke report: command, artifact path, pass/fail table, failed flow names, and current blockers.
+6. Only make the RC non-draft when the release body and validation evidence are complete.
+
+If any smoke flow fails, say exactly what failed. Do not present a release as green when the report is red or partially blocked.
+
+## Running-Cluster Upgrade Check
+
+Before testing an upgrade against a live local cluster:
+
+```bash
+k3d cluster list
+kubectl get pods -A
+kubectl get deploy -A -o wide
+```
+
+Identify the active stack ID, frontend image, backend component images, ports, and any parallel obol-stack clusters. Use tmux for long-running commands or shared sudo prompts. Clean up only stale stacks that are not the target and whose ownership is clear.
+
+After the upgrade:
+
+```bash
+kubectl get deploy -A -o wide
+kubectl get pods -A
+```
+
+Then run the targeted flow or full release smoke. Archive the log and artifact directory path in the PR or release description.
+
+## Final Report
+
+End with a short, auditable status:
+
+- PRs reviewed, fixed, merged, skipped, or left blocked.
+- Bot comments resolved or remaining.
+- Image pins and Renovate rules checked.
+- Smoke command, report path, and pass/fail summary.
+- Release URL and draft/prerelease status.
+- Cleanup performed and any cluster/worktree intentionally left running.
diff --git a/.github/workflows/helm-template-smoke.yml b/.github/workflows/helm-template-smoke.yml
@@ -26,7 +26,7 @@ jobs:
       - name: Set up Helm
         uses: azure/setup-helm@dda3372f752e03dde6b3237bc9431cdc2f7a02a2 # v5.0.0
         with:
-          version: v3.20.1   # match obolup.sh pinned version
+          version: v3.21.0   # match obolup.sh pinned version
 
       - name: helm template ./base
         run: |

diff --git a/cmd/obol/model.go b/cmd/obol/model.go
@@ -267,6 +267,7 @@ func modelSetupCustomCommand(cfg *config.Config) *cli.Command {
 			&cli.StringFlag{Name: "endpoint", Usage: "Full base URL (e.g. http://host:8000/v1)", Required: true},
 			&cli.StringFlag{Name: "model", Usage: "Model identifier at the endpoint — this is also the LiteLLM model_name the agent will call", Required: true},
 			&cli.StringFlag{Name: "api-key", Usage: "API key (optional, some endpoints don't require it)"},
+			&cli.BoolFlag{Name: "disable-thinking", Usage: "Tells a model not to use its thinking mode to reason about turns for longer."},
 			&cli.BoolFlag{Name: "no-sync", Usage: "Skip the agent model sync (batch with other model commands, then run `obol model sync` once)"},
 		},
 		Action: func(ctx context.Context, cmd *cli.Command) error {
@@ -275,7 +276,10 @@ func modelSetupCustomCommand(cfg *config.Config) *cli.Command {
 			modelName := cmd.String("model")
 			apiKey := cmd.String("api-key")
 
-			if err := model.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey); err != nil {
+			options := model.CustomEndpointOptions{
+				DisableThinking: cmd.Bool("disable-thinking"),
+			}
+			if err := model.AddCustomEndpointWithOptions(cfg, u, endpoint, modelName, apiKey, options); err != nil {
 				return err
 			}
 

diff --git a/flows/flow-01-prerequisites.sh b/flows/flow-01-prerequisites.sh
@@ -9,8 +9,12 @@ run_step "Docker daemon running" docker info
 # LLM endpoint must be serving. Full QA uses an OpenAI-compatible
 # vLLM/llama.cpp endpoint; local development can still use Ollama.
 if [ -n "${OBOL_LLM_ENDPOINT:-}" ]; then
-    run_step_grep "OpenAI-compatible LLM endpoint serving models" "data|id" \
-        curl -sf "${OBOL_LLM_ENDPOINT%/}/models"
+    step "OpenAI-compatible LLM endpoint returns final chat content"
+    if preflight_openai_llm_endpoint; then
+        pass "LLM endpoint usable for model ${OBOL_LLM_MODEL:-qwen36-deep}"
+    else
+        fail "LLM endpoint did not pass OpenAI-compatible chat preflight"
+    fi
 else
     run_step_grep "Ollama serving models" "models" curl -sf http://localhost:11434/api/tags
 fi

diff --git a/flows/flow-03-inference.sh b/flows/flow-03-inference.sh
@@ -60,22 +60,62 @@ else
 fi
 
 # §3d: Tool-call passthrough
+tool_call_name() {
+    python3 -c '
+import json
+import sys
+
+try:
+    data = json.load(sys.stdin)
+except Exception:
+    sys.exit(1)
+
+choices = data.get("choices") or []
+if not choices:
+    sys.exit(1)
+
+message = choices[0].get("message") or {}
+for call in message.get("tool_calls") or []:
+    function = call.get("function") or {}
+    if function.get("name") == "get_weather":
+        print("get_weather")
+        sys.exit(0)
+
+sys.exit(1)
+'
+}
+
 step "Tool-call passthrough"
 tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $LITELLM_KEY" \
     -d '{
         "model":"'"$LITELLM_MODEL"'",
-        "messages":[{"role":"user","content":"What is the weather in London?"}],
+        "messages":[{"role":"user","content":"Call the get_weather tool for London. Do not answer in text."}],
         "tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
-        "max_tokens":100,"stream":false
+        "tool_choice":{"type":"function","function":{"name":"get_weather"}},
+        "temperature":0,"max_tokens":100,"stream":false
     }' 2>&1) || true
 
-if echo "$tool_out" | grep -q "tool_calls\|get_weather"; then
+if echo "$tool_out" | tool_call_name >/dev/null 2>&1; then
     pass "Tool-call passthrough works"
 else
-    # Small/local models may not reliably support tool calls — soft fail
-    fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
+    # Some OpenAI-compatible endpoints accept tools but reject forced tool_choice.
+    tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
+        -H "Content-Type: application/json" \
+        -H "Authorization: Bearer $LITELLM_KEY" \
+        -d '{
+            "model":"'"$LITELLM_MODEL"'",
+            "messages":[{"role":"user","content":"Call the get_weather tool with location London. Do not answer in text."}],
+            "tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
+            "temperature":0,"max_tokens":100,"stream":false
+        }' 2>&1) || true
+
+    if echo "$tool_out" | tool_call_name >/dev/null 2>&1; then
+        pass "Tool-call passthrough works"
+    else
+        fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
+    fi
 fi
 
 cleanup_pid "$PF_PID"

diff --git a/flows/flow-04-agent.sh b/flows/flow-04-agent.sh
@@ -117,10 +117,11 @@ if [ -n "${OBOL_LLM_ENDPOINT:-}" ] && [ "$model_name" != "${OBOL_LLM_MODEL:-qwen
     exit 0
 fi
 
+llm_payload_suffix="$(llm_disable_thinking_payload_suffix)"
 out=$(curl -sf --max-time 120 -X POST "http://localhost:${AGENT_PF_PORT}/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $TOKEN" \
-    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false}" 2>&1) || true
+    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false${llm_payload_suffix}}" 2>&1) || true
 
 if echo "$out" | grep -q "choices"; then
     pass "Agent inference returned response"
@@ -138,7 +139,7 @@ step "Agent answers 'hello' without parroting tool catalogue (model rank regress
 hello_out=$(curl -sf --max-time 120 -X POST "http://localhost:${AGENT_PF_PORT}/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $TOKEN" \
-    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}],\"max_tokens\":150,\"stream\":false}" 2>&1) || true
+    -d "{\"model\":\"$model_name\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}],\"max_tokens\":150,\"stream\":false${llm_payload_suffix}}" 2>&1) || true
 hello_content=$(echo "$hello_out" | python3 -c "
 import json, sys
 try:

diff --git a/flows/flow-06-sell-setup.sh b/flows/flow-06-sell-setup.sh
@@ -20,14 +20,16 @@ else
     fail "CRD API group/version unexpected: group=$crd_group, version=$crd_version"
 fi
 run_step_grep "x402 verifier running" "Running" "$OBOL" kubectl get pods -n x402 --no-headers
-# x402-verifier has 2 replicas for high availability (CLAUDE.md: "2 replicas")
-step "x402-verifier has 2 replicas (high availability)"
+# The embedded x402 manifest intentionally runs one verifier replica in local
+# stacks. Keep the smoke assertion aligned with the shipped manifest; HA belongs
+# to production sizing, not the single-node release-smoke cluster.
+step "x402-verifier has 1 replica (local stack sizing)"
 verifier_replicas=$("$OBOL" kubectl get deployment x402-verifier -n x402 \
     -o jsonpath='{.spec.replicas}' 2>&1) || true
-if [ "$verifier_replicas" = "2" ]; then
-    pass "x402-verifier: 2 replicas (HA payment gate)"
+if [ "$verifier_replicas" = "1" ]; then
+    pass "x402-verifier: 1 replica (local payment gate)"
 else
-    fail "x402-verifier replica count: $verifier_replicas (expected 2)"
+    fail "x402-verifier replica count: $verifier_replicas (expected 1)"
 fi
 # x402-verifier service must be on port 8080 (matches ForwardAuth address :8080/verify)
 step "x402-verifier service on port 8080"

diff --git a/flows/flow-11-dual-stack.sh b/flows/flow-11-dual-stack.sh
@@ -608,6 +608,8 @@ except Exception as e:
 wait_for_paid_inference() {
     local attempts="${1:-24}"
     local delay="${2:-5}"
+    local transient_retries="${PAID_INFERENCE_TRANSIENT_RETRIES:-1}"
+    local transient_seen=0
     local out=""
     local i
 
@@ -617,9 +619,14 @@ wait_for_paid_inference() {
             printf '%s\n' "$out"
             return 0
         fi
-        if echo "$out" | grep -q "Payment verification failed" || \
-           echo "$out" | grep -q "ERROR=503" || \
-           echo "$out" | grep -q "ServiceUnavailableError"; then
+        if echo "$out" | paid_inference_pending_error; then
+            sleep "$delay"
+            continue
+        fi
+        if echo "$out" | paid_inference_transient_error && [ "$transient_seen" -lt "$transient_retries" ]; then
+            transient_seen=$((transient_seen + 1))
+            echo "RETRY_TRANSIENT=${transient_seen}/${transient_retries}: paid inference hit transient timeout/error" >&2
+            printf '%s\n' "$out" >&2
             sleep "$delay"
             continue
         fi
@@ -1271,6 +1278,7 @@ else
 fi
 
 step "Bob's agent: discover Alice via ERC-8004 registry"
+llm_payload_suffix="$(llm_disable_thinking_payload_suffix)"
 discover_response=$(curl -sf --max-time 300 \
     -X POST "http://localhost:${BOB_AGENT_PORT}/v1/chat/completions" \
     -H "Authorization: Bearer $BOB_TOKEN" \
@@ -1282,7 +1290,7 @@ discover_response=$(curl -sf --max-time 300 \
             \"content\": \"Search the ERC-8004 agent identity registry on Base Sepolia for recently registered AI inference services that support x402 payments. Use the discovery skill to scan for agents. Look for one named 'Dual-Stack Test Inference' or similar with natural_language_processing skills. Report what you find — the agent ID, name, endpoint URL, and whether it supports x402.\"
         }],
         \"max_tokens\": 4000,
-	        \"stream\": false
+	        \"stream\": false${llm_payload_suffix}
 	    }" 2>&1 || true)
 
 discover_content=$(extract_assistant_content "$discover_response" 2>/dev/null || true)
@@ -1341,7 +1349,7 @@ else
                 \"content\": \"Use the buy-x402 skill and your terminal tool. Run exactly once: ERPC_URL=http://erpc.erpc.svc.cluster.local/rpc ERPC_NETWORK=base-sepolia python3 $BOB_OBOL_SKILLS_DIR/buy-x402/scripts/buy.py buy alice-inference --endpoint $TUNNEL_URL/services/alice-inference/v1/chat/completions --model $OBOL_LLM_MODEL --count $FLOW11_BUY_COUNT\"
             }],
             \"max_tokens\": 4000,
-	            \"stream\": false
+	            \"stream\": false${llm_payload_suffix}
 	        }" 2>&1 || true)
 
     buy_content=$(extract_assistant_content "$buy_response" 2>/dev/null || true)