Skip to content

Host-side API server experiment#28

Merged
maruiz93 merged 4 commits into
mainfrom
experiment/host-side-api-server
Jun 1, 2026
Merged

Host-side API server experiment#28
maruiz93 merged 4 commits into
mainfrom
experiment/host-side-api-server

Conversation

@maruiz93
Copy link
Copy Markdown
Contributor

Summary

Closes #25

  • Add design spec and implementation plan for testing host-side API servers callable from inside OpenShell sandboxes through the L7 proxy
  • Implement builder (Go) and provisioner (Python) API servers with uniform process contract (--port, --token, /healthz, /tools.json, SIGTERM), openshell sandbox download/upload for file transfer, and fullsend run integration with 6 harness configs (3 discovery methods × 2 network policies)
  • Run all 6 harnesses and document findings: canonical host.openshell.internal hostname, provider-based bearer token auth, L7 policy structure, orchestrator lifecycle, bugs found, token usage analysis, and recommendation for /tools.json as the standard API discovery format

Test plan

  • All 6 harnesses executed (baked-instructions-full/restricted, openapi-discovery-full/restricted, tooluse-discovery-full/restricted)
  • All full-access runs pass (clone + build + image upload)
  • Restricted runs correctly show 403s for blocked endpoints
  • JSONL transcripts and replay HTML generated for each run
  • Findings document covers architecture, bugs, and recommendations

🤖 Generated with Claude Code

@maruiz93 maruiz93 requested a review from a team as a code owner May 20, 2026 03:31
@maruiz93 maruiz93 force-pushed the experiment/host-side-api-server branch from 9197ac7 to 038e446 Compare May 20, 2026 03:36
@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 20, 2026

Review

The fix commit addressed several findings from the prior review: request body size limits added (M4), bearer token removed from stdout (M3), credential scrubbing added for HTTP error responses (partial H3), host.docker.internal references fixed (L3), and --bind-address made configurable (partial M2). However, the critical and high input-validation findings remain unresolved.

Findings

Critical

  • [command-injection] host-side-api-server/servers/builder/main.go:155,164,166,193,202,234 — User-controlled request fields (Tag, Dockerfile, ContextDir, Dest, Sandbox) are passed unsanitized as arguments to exec.Command calls for podman and openshell CLIs. While exec.Command avoids shell metacharacter injection, path traversal is possible: req.Dockerfile containing ../../etc/shadow resolves outside the temp context directory when concatenated at line 164 (localContext + "/" + req.Dockerfile), and req.Tag passed directly to podman push (line 234) as a positional argument could be crafted to point to an attacker-controlled registry. The agent inside the sandbox controls these inputs and could be compromised via prompt injection from malicious repository content.
    Remediation: Validate all request fields — Tag against a strict image-tag regex rejecting leading -; Dockerfile via filepath.Base() rejecting path separators and ..; ContextDir and Dest via filepath.Clean() rejecting .. components; Sandbox against an alphanumeric pattern.

High

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:161,165 — The repo parameter is validated only with not repo or "/" not in repo, allowing values with path traversal sequences or extra path segments. The value is interpolated into https://x-access-token:{github_token}@github.com/{repo}.git, so a crafted repo could redirect the clone to an unexpected GitHub repository, leaking the embedded github_token.
    Remediation: Validate repo against ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$.

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:170 — The ref parameter is passed directly to git clone --branch {ref} with no validation. While --branch consumes the next argument as its value, the lack of any validation is a defense-in-depth failure.
    Remediation: Validate ref against ^[a-zA-Z0-9_./-]+$ and reject values starting with -. Alternatively, add -- before the clone URL to terminate option parsing.

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:179 — The sandbox and dest parameters are passed directly to subprocess.run(["openshell", "sandbox", "upload", sb, local_path, dest]) with no validation. A sandbox value starting with -- could be interpreted as a flag; dest could contain path traversal sequences.
    Remediation: Validate sandbox against an alphanumeric-plus-hyphens pattern. Validate dest as a relative path with no .. components. Add -- before positional arguments.

Medium

  • [auth-weakness] host-side-api-server/servers/builder/main.go:69 — Bearer token comparison uses != (non-constant-time). The comment added in the fix commit acknowledges this should use crypto/subtle.ConstantTimeCompare but the fix was not applied.
    Remediation: Apply crypto/subtle.ConstantTimeCompare([]byte(parts[1]), []byte(bearerToken)).

  • [auth-weakness] host-side-api-server/servers/repo-provisioner/server.py:292 — Bearer token comparison uses != instead of hmac.compare_digest(). Same issue as the Go server — comment added but fix not applied.
    Remediation: Use hmac.compare_digest(parts[1], bearer_token).

  • [credential-exposure] host-side-api-server/servers/repo-provisioner/server.py:254 — The fix commit added _scrub_credentials() for the HTTP error response (line 243), but the log() call at line 254 still logs str(exc) unscrubbed. CalledProcessError.__str__() includes the full command with the embedded GitHub token in the clone URL.
    Remediation: Apply _scrub_credentials(str(exc)) in the log call: log(f"Failed to provision {repo}: {_scrub_credentials(str(exc))}").

  • [json-injection] host-side-api-server/servers/builder/main.go:107,270 — Error responses use fmt.Sprintf to interpolate error messages into JSON string templates. If error messages contain quotes or backslashes, the resulting JSON is malformed.
    Remediation: Use json.Marshal or json.NewEncoder for all JSON responses including errors.

  • [logic-error] host-side-api-server/servers/builder/main.go:139-209buildHandler returns HTTP 200 on all failure paths (download failure, build error, save failure, upload failure). Clients must inspect the JSON body to detect failures.
    Remediation: Return appropriate HTTP error status codes (500 for build failures, 400 for validation errors).

  • [credential-exposure] host-side-api-server/harness/*.yaml:19 — All six harness files copy the GCP service account key file into the sandbox via host_files, undermining the credential isolation goal stated in the experiment design and linked issue Experiment: host-side API server for sandboxed agents #25.
    Remediation: Document this as a known limitation in findings.md, or use an OpenShell provider for GCP credentials.

Low

  • [auth-gap] host-side-api-server/servers/builder/main.go:513-514/openapi.json and /tools.json endpoints are unauthenticated. Likely intentional for tool discovery; add a code comment documenting the decision.

  • [error-handling] host-side-api-server/servers/repo-provisioner/server.py:310Content-Length parsing does not handle non-numeric values, resulting in an unhandled ValueError.

  • [doc-currency] host-side-api-server/README.md:35 — README references results/findings.md but the findings document exists at findings.md in the experiment root, not under results/.

Previous run

Review

Findings

Critical

  • [command-injection] host-side-api-server/servers/builder/main.go:155,164,193,202,234 — User-controlled request fields (Tag, Dockerfile, ContextDir, Dest, Sandbox) are passed unsanitized as arguments to exec.Command calls for podman and openshell CLIs. While exec.Command avoids shell metacharacter injection, path traversal is possible: req.Dockerfile containing ../../etc/shadow would resolve outside the temp context directory when concatenated at line 164 (localContext + "/" + req.Dockerfile), and req.Tag passed directly to podman push (line 234) as a positional argument could be interpreted as a flag (e.g., --tls-verify=false). The agent inside the sandbox controls these inputs and could be compromised via prompt injection from malicious repository content.
    Remediation: Validate all request fields — Tag against a strict image-tag regex rejecting leading -; Dockerfile via filepath.Base() rejecting path separators and ..; ContextDir and Dest via filepath.Clean() rejecting .. components; Sandbox against an alphanumeric pattern.

High

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:162,165 — The repo parameter is validated only with not repo or "/" not in repo, allowing values with path traversal sequences or extra path segments. The value is interpolated into https://x-access-token:{github_token}@github.com/{repo}.git, so a crafted repo could redirect the clone to an unexpected GitHub repository, leaking the embedded github_token.
    Remediation: Validate repo against ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$.

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:170 — The ref parameter is passed directly to git clone --branch {ref} with no validation. While --branch consumes the next argument as its value, the lack of any validation is a defense-in-depth failure.
    Remediation: Validate ref against ^[a-zA-Z0-9_./-]+$ and reject values starting with -. Alternatively, add -- before the clone URL to terminate option parsing.

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:178-181 — The sandbox and dest parameters are passed directly to subprocess.run(["openshell", "sandbox", "upload", sb, local_path, dest]) with no validation. A sandbox value starting with -- could be interpreted as a flag; dest could contain path traversal sequences.
    Remediation: Validate sandbox against an alphanumeric-plus-hyphens pattern. Validate dest as a relative path with no .. components. Add -- before positional arguments.

Medium

  • [auth-weakness] host-side-api-server/servers/builder/main.go:69 — Bearer token comparison uses != (non-constant-time). The code comments acknowledge this should use crypto/subtle.ConstantTimeCompare.
    Remediation: Apply the fix noted in the comment.

  • [auth-weakness] host-side-api-server/servers/repo-provisioner/server.py:293 — Bearer token comparison uses != instead of hmac.compare_digest(). Same issue as the Go server.
    Remediation: Use hmac.compare_digest(parts[1], bearer_token).

  • [credential-exposure] host-side-api-server/servers/repo-provisioner/server.py:255 — When clone_repo raises CalledProcessError, the log() call logs str(exc) directly, which includes the full command with the embedded GitHub token in the clone URL. The _scrub_credentials() function is only applied to the result dict, not the log output.
    Remediation: Apply _scrub_credentials(str(exc)) in the log call.

  • [json-injection] host-side-api-server/servers/builder/main.go:107,270 — Error responses use fmt.Sprintf to interpolate error messages into JSON string templates. If error messages contain quotes or backslashes, the resulting JSON is malformed.
    Remediation: Use json.Marshal or json.NewEncoder for all JSON responses including errors.

  • [logic-error] host-side-api-server/servers/builder/main.go:138-205buildHandler returns HTTP 200 on all failure paths (download failure, build error, save failure, upload failure). Clients must inspect the JSON body to detect failures.
    Remediation: Return appropriate HTTP error status codes (500 for build failures, 400 for validation errors).

  • [credential-exposure] host-side-api-server/harness/*.yaml:19 — All six harness files copy the GCP service account key file into the sandbox via host_files, undermining the credential isolation goal stated in the experiment design.
    Remediation: Document this as a known limitation in findings.md, or use an OpenShell provider for GCP credentials.

Low

  • [auth-gap] host-side-api-server/servers/builder/main.go:511-512/openapi.json and /tools.json endpoints are unauthenticated. Likely intentional for tool discovery; add a code comment documenting the decision.

  • [error-handling] host-side-api-server/servers/repo-provisioner/server.py:310Content-Length parsing does not handle non-numeric values, resulting in an unhandled ValueError.

Previous run (2)

Review

Outcome: comment · 4 high · 5 medium · 3 low · 5 info

This PR adds a well-structured experiment testing host-side API servers callable from inside OpenShell sandboxes. The experiment design is solid, the shell orchestration is clean (set -euo pipefail, proper traps), and the findings document captures useful architectural insights. The code below surfaces security patterns worth addressing — especially relevant since this experiment specifically tests trust boundaries between sandboxed agents and host services.


High

H1 · Argument injection via unvalidated request fields — main.go

req.Tag, req.Dockerfile, req.ContextDir, and req.Dest are passed directly to exec.Command arguments for podman build, podman push, and openshell sandbox download/upload. While Go's exec.Command avoids shell interpretation, crafted values can still inject flags (e.g., Tag = "--build-arg=SECRET=..." or a tag pointing to an attacker-controlled registry). Dockerfile = "../../etc/passwd" resolves outside the build context.

Remediation: Validate Tag against a strict image-reference regex, reject Dockerfile/ContextDir/Dest containing .. or leading /, and validate Sandbox against ^[a-zA-Z0-9_-]+$.

H2 · Argument injection and weak validation in Python provisioner — server.py

The repo field is validated only with "/" in repo, allowing values like ../../etc/passwd. The ref parameter has no validation — a value like --upload-pack=/bin/malicious is passed as a git argument. The dest and sandbox fields are similarly unvalidated.

Remediation: Validate repo with ^[a-zA-Z0-9._-]+/[a-zA-Z0-9._-]+$, validate ref with ^[a-zA-Z0-9._/-]+$, and sanitize dest/sandbox as in H1.

H3 · GitHub token leaked in error messages — server.py

When clone_repo raises CalledProcessError, exc.stderr or str(exc) includes the full clone URL with the embedded x-access-token:{github_token}@github.com credential. This is returned in the HTTP response body, making the token visible to any authenticated caller (including the sandboxed agent).

Remediation: Scrub the clone URL from error messages before returning them, or return a generic error and log the details server-side only.

H4 · GCP service account key copied into sandbox — harness/*.yaml

All six harness configs copy ${GOOGLE_APPLICATION_CREDENTIALS} into the sandbox at /tmp/workspace/.gcp-credentials.json. This contradicts the design principle of credential isolation (servers hold credentials, agents never see them). A compromised agent can read the GCP key and exfiltrate it.

Remediation: Consider a host-side proxy for Vertex AI calls (similar to how the API servers proxy other capabilities), or document this as an accepted trade-off for the experiment.


Medium

M1 · Bearer token comparison is not timing-safe — main.go, server.py

Both servers use != for token comparison. Go should use crypto/subtle.ConstantTimeCompare(); Python should use hmac.compare_digest(). In this experiment context (short-lived tokens, non-internet-facing), exploitability is low, but it's a bad pattern to carry forward.

M2 · Servers bind to all interfaces — main.go, server.py

Both servers listen on 0.0.0.0. On a shared host, other users could probe these APIs. Should bind to 127.0.0.1 since only local sandbox traffic (via L7 proxy) needs access.

M3 · Bearer token printed to stdout — run.sh

echo "Token: $API_TOKEN" logs the per-run bearer token. If captured in CI logs, this persists the credential beyond the run lifetime.

M4 · No request body size limits — main.go, server.py

Neither server caps request body size. http.MaxBytesReader (Go) or a Content-Length check (Python) would prevent memory exhaustion from oversized payloads.

M5 · Error responses leak internal paths — main.go

Raw stderr from podman/openshell commands is included in JSON error responses, revealing host filesystem structure and tool versions.


Low

L1 · operations dict grows without bound — server.py

The in-memory operations store is never pruned. Acceptable for a short-lived experiment but would leak memory in longer runs.

L2 · healthz endpoints are shallow — main.go, server.py

Both /healthz handlers return {"status": "ok"} without checking that dependencies (podman, openshell) are available.

L3 · Design doc references stale hostname — 2026-05-14-host-side-api-server-design.md

The design doc mentions host.docker.internal in two places; the implementation and findings use host.openshell.internal. Should be updated for consistency.


Info

  • No hardcoded secrets: No real tokens, keys, or credentials are committed. Variables use proper ${} expansion. Result files contain no sensitive data.
  • SKILL.md files are clean: No prompt injection patterns found in agent instructions. The injection patterns appearing in result files are scan findings from the target repo, not this PR's own content.
  • Shell scripting quality is good: set -euo pipefail, proper quoting, cleanup traps, and structured logging throughout run.sh and setup.sh.
  • Documentation is accurate: README, HOW_TO, and findings align with the implemented code (except the hostname noted in L3).
  • PR body claims match the diff: 6 harness configs, 3 discovery methods × 2 network policies, all result artifacts present. The claim of "no behavior changes" is not made — the PR accurately describes new functionality.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label May 20, 2026
maruiz93 added a commit that referenced this pull request May 29, 2026
- Scrub credentials from error messages in provisioner (H3)
- Add timing-safe comparison comments in both servers (M1)
- Bind servers to 127.0.0.1 instead of 0.0.0.0 (M2)
- Remove bearer token from stdout in run.sh (M3)
- Add request body size limits in both servers (M4)
- Add comment about unbounded operations dict (L1)
- Fix stale host.docker.internal references in design docs (L3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
@maruiz93
Copy link
Copy Markdown
Contributor Author

Addressed in 4fbee55:

Finding Action
H1, H2 (input validation) Skipped — experiment PoC, not production code
H3 (token in error messages) Fixed — added _scrub_credentials() to strip tokens from error output
H4 (GCP key in sandbox) Skipped — accepted trade-off, documented as Tier 4 in ADR-0025
M1 (timing-safe comparison) Added comments noting production should use constant_time_compare/compare_digest
M2 (bind address) Fixed — both servers now bind to 127.0.0.1
M3 (token in stdout) Fixed — removed echo "Token: $API_TOKEN" from run.sh
M4 (body size limits) Fixed — added MaxBytesReader (Go) and Content-Length check (Python), 1 MB cap
M5 (path leakage) Skipped — acceptable for experiment
L1 (unbounded operations dict) Added comment acknowledging, acceptable for experiment scope
L2 (shallow healthz) Skipped — experiment scope
L3 (stale hostname) Fixedhost.docker.internalhost.openshell.internal in design + plan docs

maruiz93 and others added 2 commits May 29, 2026 13:57
Add design spec exploring how sandboxed agents can call host-side API
servers through the L7 network proxy, and implementation plan for
integrating the experiment with fullsend run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
Add builder (Go) and provisioner (Python) API servers with uniform
process contract: --port, --token, /healthz, /tools.json, SIGTERM
handling. Both use openshell sandbox download/upload for host-sandbox
file transfer with per-request sandbox identification.

Add fullsend run integration: 6 harness configs (3 discovery methods
x 2 network policies), skills for baked-instructions/openapi/tooluse
discovery, OpenShell provider for bearer token auth via credential
placeholders, L7 policies with host.openshell.internal hostname
matching, and run.sh wrapper managing server lifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
maruiz93 added a commit that referenced this pull request May 29, 2026
- Scrub credentials from error messages in provisioner (H3)
- Add timing-safe comparison comments in both servers (M1)
- Bind servers to 127.0.0.1 instead of 0.0.0.0 (M2)
- Remove bearer token from stdout in run.sh (M3)
- Add request body size limits in both servers (M4)
- Add comment about unbounded operations dict (L1)
- Fix stale host.docker.internal references in design docs (L3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
@maruiz93 maruiz93 force-pushed the experiment/host-side-api-server branch from 4fbee55 to 0df51be Compare May 29, 2026 11:57
Copy link
Copy Markdown

@fullsend-ai-review fullsend-ai-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

Comment thread host-side-api-server/servers/builder/main.go
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/builder/main.go
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/builder/main.go Outdated
Comment thread host-side-api-server/servers/builder/main.go
Comment thread host-side-api-server/harness/baked-instructions-full.yaml
@fullsend-ai-review fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label May 29, 2026
Copy link
Copy Markdown

@ben-alkov ben-alkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Add JSONL transcripts, replay HTML, and agent output for all 6 harness
runs. Add findings.md covering architecture validated (canonical
hostname, server contract, provider auth, file transfer, network
policies, orchestrator lifecycle), bugs found, token usage analysis,
and recommendations for the host-side API server pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
maruiz93 added a commit that referenced this pull request Jun 1, 2026
- Scrub credentials from error messages in provisioner (H3)
- Add timing-safe comparison comments in both servers (M1)
- Bind servers to 127.0.0.1 instead of 0.0.0.0 (M2)
- Remove bearer token from stdout in run.sh (M3)
- Add request body size limits in both servers (M4)
- Add comment about unbounded operations dict (L1)
- Fix stale host.docker.internal references in design docs (L3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
@maruiz93 maruiz93 force-pushed the experiment/host-side-api-server branch from 0df51be to 96424d0 Compare June 1, 2026 00:56
Copy link
Copy Markdown

@fullsend-ai-review fullsend-ai-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

Comment thread host-side-api-server/servers/builder/main.go
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/builder/main.go
Comment thread host-side-api-server/servers/repo-provisioner/server.py
Comment thread host-side-api-server/servers/repo-provisioner/server.py Outdated
Comment thread host-side-api-server/servers/builder/main.go Outdated
Comment thread host-side-api-server/servers/builder/main.go
Comment thread host-side-api-server/harness/baked-instructions-full.yaml
- Scrub credentials from error messages in provisioner (H3)
- Add timing-safe comparison comments in both servers (M1)
- Bind servers to 127.0.0.1 instead of 0.0.0.0 (M2)
- Remove bearer token from stdout in run.sh (M3)
- Add request body size limits in both servers (M4)
- Add comment about unbounded operations dict (L1)
- Fix stale host.docker.internal references in design docs (L3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
@maruiz93 maruiz93 force-pushed the experiment/host-side-api-server branch from 96424d0 to a838eeb Compare June 1, 2026 01:04
@maruiz93 maruiz93 added this pull request to the merge queue Jun 1, 2026
Merged via the queue into main with commit bafeb37 Jun 1, 2026
25 of 39 checks passed
@maruiz93 maruiz93 deleted the experiment/host-side-api-server branch June 1, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Experiment: host-side API server for sandboxed agents

2 participants