Skip to content

feat(agent): three-axis env inheritance via [agent.clear_env]#722

Open
dogzzdogzz wants to merge 4 commits intoopenabdev:mainfrom
dogzzdogzz:feat/agent-clear-env
Open

feat(agent): three-axis env inheritance via [agent.clear_env]#722
dogzzdogzz wants to merge 4 commits intoopenabdev:mainfrom
dogzzdogzz:feat/agent-clear-env

Conversation

@dogzzdogzz
Copy link
Copy Markdown
Contributor

@dogzzdogzz dogzzdogzz commented May 4, 2026

Closes #723.
Discussion: https://discord.com/channels/1491295327620169908/1500049405984772096/1500055699261231267

Summary

Replaces the flat [agent].inherit_env allow-list with a structured [agent.clear_env] table that supports allow-list, deny-list, and a pure escape-hatch toggle.

Decision tree:

if enabled (default true):
    if allow_list non-empty:    pass only those keys from process env
    elif deny_list non-empty:   pass all process env EXCEPT deny_list
    else:                       pass nothing (pure secure default)
else:
    pass all process env (escape hatch — both lists ignored)

allow_list takes priority over deny_list when both are set under enabled = true (the deny-list branch is only reached when allow-list is empty). [agent].env always wins on key conflict.

Why

The existing inherit_env allow-list is brittle for AWS-IRSA / web-identity workloads where Kubernetes auto-injects many env vars (AWS_ROLE_ARN, AWS_WEB_IDENTITY_TOKEN_FILE, AWS_REGION, ...) that the user did NOT add manually but DO need to pass through to the agent. Listing every benign key in an allow-list misses new ones as platforms evolve.

With deny_list, users can run in "inherit all minus known-secret keys" mode — the natural shape of AWS-IRSA + sidecar deployments where most injected env is benign but a few keys (DISCORD_BOT_TOKEN, SLACK_BOT_TOKEN) must stay contained.

See #723 for the full problem statement.

Helm values

agents:
  myagent:
    clearEnv:
      enabled: true     # default; set false to disable filtering entirely
      allowList: []     # only-these mode
      denyList: []      # all-except-these mode (only when allowList empty)

TOML config

[agent.clear_env]
enabled = true
allow_list = ["API_BASE_URL"]
deny_list = ["DISCORD_BOT_TOKEN"]

BREAKING CHANGE (beta)

agents.<name>.inheritEnv is removed. Migration:

- inheritEnv: ["A", "B"]
+ clearEnv:
+   allowList: ["A", "B"]

The Helm template hard-fails with a migration message if the legacy key is encountered, so misconfigured upgrades fail loudly rather than silently regressing.

TOML config users: rename inherit_env = [...] to [agent.clear_env] table with allow_list = [...].

Files

  • src/config.rs — new ClearEnvConfig { enabled, allow_list, deny_list }
  • src/acp/connection.rs — rewrites build_agent_env with the decision tree + 4 new tests
  • src/acp/pool.rs — passes the struct through
  • charts/openab/templates/configmap.yaml — emits [agent.clear_env] block + hard-fails on legacy inheritEnv
  • charts/openab/tests/configmap_test.yaml — covers all four shapes incl. migration failure
  • config.toml.example — documents the decision tree with three example shapes
  • docs/config-reference.md — replaces inherit_env row with clear_env table + sub-table, decision tree, AWS-IRSA use case
  • AGENTS.md — arch-tree comment + Security section updated with decision tree

Test plan

  • cargo clippy -- -D warnings clean
  • cargo test --bin openab — 201 tests pass, including 7 new build_agent_env tests covering: explicit-env precedence, allow-list-only mode, deny-list-only mode (when allow-list empty), allow-list takes priority when both set, enabled=false ignores both lists, enabled=true empty-lists inherits nothing, missing-var skip
  • Helm unittest cases added for clearEnv.allowList, clearEnv.enabled=false, clearEnv.denyList, and the legacy-inheritEnv migration failure
  • Manual: deploy with enabled=false + denyList and confirm AWS_* env vars reach the agent while DISCORD_BOT_TOKEN stays out

🤖 Generated with Claude Code

Replaces the flat [agent].inherit_env allow-list with a structured
[agent.clear_env] table that supports both allow-list and deny-list
filtering plus an enabled toggle.

Decision tree:
  if enabled (default true):
    if allow_list non-empty   -> only those keys pass through
    elif deny_list non-empty  -> all process env passes EXCEPT deny_list
    else                       -> nothing inherited (pure secure default)
  else:
    full process env inherited; both lists ignored (escape hatch)

Use case: AWS-IRSA / web-identity workloads where k8s auto-injects many
AWS_* env vars. Listing every benign one in an allow-list is brittle;
deny_list = ["DISCORD_BOT_TOKEN", ...] is a much better fit.

Helm values structure:
  agents.<name>.clearEnv:
    enabled: true
    allowList: []
    denyList: []

BREAKING CHANGE (beta): agents.<name>.inheritEnv is removed. Migration:
  inheritEnv: ["A", "B"]  ->  clearEnv.allowList: ["A", "B"]

The helm template hard-fails with a migration message if the legacy key
is encountered. TOML config users: rename `inherit_env = [...]` to
`[agent.clear_env]\nallow_list = [...]`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dogzzdogzz dogzzdogzz requested a review from thepagent as a code owner May 4, 2026 03:33
@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days pending-screening PR awaiting automated screening labels May 4, 2026
config-reference.md: replace inherit_env row with clear_env table +
sub-table for enabled/allow_list/deny_list; add decision tree, baseline
note, and AWS-IRSA use case example.

AGENTS.md: update arch-tree comment from "env_clear whitelist" to
"[agent.clear_env] policy"; expand Security section with the decision
tree and guidance on when to reach for allow_list vs deny_list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot removed the closing-soon PR missing Discord Discussion URL — will auto-close in 3 days label May 4, 2026
Make it explicit that HOME/PATH/USER (Unix) and USERPROFILE/USERNAME/
PATH/SystemRoot/SystemDrive (Windows) are always added unconditionally,
so:

- allow_list = ["FOO"] does NOT mean "only FOO" — the agent still
  receives the baseline + [agent].env + FOO.
- deny_list = ["PATH"] does NOT remove PATH — baseline keys cannot
  be denied.

Adds the clarification to the config-reference.md table rows for
allow_list and deny_list, plus the matching ClearEnvConfig field
docstrings in src/config.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the three-knob clearEnv block to each commented-out agent example
(claude, opencode, cursor) so operators discover it via `helm show
values` rather than only reading templates/configmap.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shaun-agent
Copy link
Copy Markdown
Contributor

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report ## Intent

PR #722 is trying to replace OpenAB’s current flat agent environment allow-list with a more expressive environment inheritance policy.

The operator-visible problem is that [agent].inherit_env works poorly for Kubernetes and AWS IRSA deployments. Those platforms inject environment variables such as AWS_ROLE_ARN, AWS_WEB_IDENTITY_TOKEN_FILE, and AWS_REGION automatically, and requiring operators to enumerate every safe variable is brittle. At the same time, sensitive gateway-owned secrets such as DISCORD_BOT_TOKEN or SLACK_BOT_TOKEN should not leak into spawned agent processes.

Feat

Feature, with a beta breaking config change.

The PR introduces [agent.clear_env] with three controls:

  • enabled = true: environment filtering is active by default.
  • allow_list: pass only these process environment keys.
  • deny_list: pass all process environment keys except these, only when allow_list is empty.

It also keeps [agent].env as the final override on key conflict, removes Helm inheritEnv, updates docs/examples, and adds Rust and Helm tests.

Who It Serves

Primary beneficiaries: deployers and agent runtime operators.

Secondary beneficiaries: maintainers and reviewers, because the new model makes environment propagation behavior explicit and testable instead of relying on one overloaded allow-list.

Rewritten Prompt

Implement structured agent environment inheritance for spawned agents.

Replace the legacy [agent].inherit_env allow-list with [agent.clear_env]:

[agent.clear_env]
enabled = true
allow_list = []
deny_list = []

Behavior must be:

  1. If enabled = false, inherit the full parent process environment and ignore both lists.
  2. If enabled = true and allow_list is non-empty, inherit only keys in allow_list.
  3. If enabled = true, allow_list is empty, and deny_list is non-empty, inherit all parent environment keys except those in deny_list.
  4. If enabled = true and both lists are empty, inherit no parent environment keys.
  5. Values from [agent].env must override inherited values on conflict.

Update Rust config parsing, agent process environment construction, Helm values/templates/tests, example config, and config documentation. Add tests covering allow-list mode, deny-list mode, empty secure default, disabled filtering, precedence of explicit agent env, missing variables, and migration behavior for removed Helm inheritEnv.

Merge Pitch

This is worth advancing because it addresses a real deployment gap for AWS IRSA and sidecar-style Kubernetes environments while preserving a secure default. The current allow-list-only model pushes operational fragility onto deployers and can break silently when platforms add or rename injected variables.

Risk profile is moderate. The main reviewer concern will be the breaking removal of inheritEnv, especially whether beta status is enough to justify hard failure instead of backward-compatible migration. The other concern is semantic clarity: clear_env.enabled = false means “inherit everything,” which is powerful but potentially dangerous if misunderstood.

Best-Practice Comparison

OpenClaw principles that apply:

  • Explicit delivery routing: relevant by analogy. Environment delivery to agents should be explicit and inspectable, not accidental.
  • Isolated executions: highly relevant. Filtering inherited environment variables is part of maintaining isolation between the gateway process and agent subprocesses.
  • Run logs: partially relevant. This PR does not appear to add runtime logging of which inheritance mode was selected, but such logging could help operators diagnose env-related failures.
  • Retry/backoff, durable job persistence, gateway-owned scheduling: not directly relevant to this config change.

Hermes Agent principles that apply:

  • Fresh session per scheduled run: relevant by analogy. Each agent execution should receive a deliberate environment snapshot rather than ambient gateway state.
  • Self-contained prompts for scheduled tasks: partially relevant. For agents, runtime inputs should be self-contained; environment inheritance should be minimized or explicitly controlled.
  • Atomic writes for persisted state and file locking: not relevant unless this config is later persisted or hot-reloaded.
  • Gateway daemon tick model: not directly relevant.

Overall, the PR moves OpenAB closer to isolated execution best practices by making inherited process environment a structured policy instead of a single allow-list.

Implementation Options

Option 1: Conservative compatibility layer

Keep supporting legacy inherit_env as a deprecated alias for clear_env.allow_list. Emit warnings in docs/logs and Helm NOTES, but do not hard-fail yet. Add the new clear_env model while preserving existing deployments.

Option 2: Balanced beta breaking change

Accept the PR’s proposed shape: remove inheritEnv, hard-fail Helm templates when the legacy key is present, document TOML migration clearly, and keep tests around the new decision tree. This treats beta as permission to clean up the contract before broader adoption.

Option 3: Ambitious policy model

Introduce a named environment policy object with modes such as none, allow, deny, and all, plus optional preset profiles like aws_irsa. This would make behavior more explicit than enabled + lists, but it would require a larger config design and more migration surface.

Option 4: Runtime observability add-on

Keep the PR’s config model, but add startup or agent-spawn debug logs indicating the selected inheritance mode and counts of inherited keys, without logging values. This improves operator confidence without changing semantics.

Comparison Table

Option Speed to ship Complexity Reliability Maintainability User impact Fit for OpenAB right now
Conservative compatibility layer Medium Medium High for upgrades Medium Low disruption Good if avoiding beta breakage matters
Balanced beta breaking change High Low-Medium High if migration failure is clear High Moderate migration cost Strong fit
Ambitious policy model Low High Potentially high later Medium Better long-term clarity, slower delivery Too large for this PR
Runtime observability add-on Medium Low Medium-High High Helps operators debug Good follow-up or small addition

Recommendation

Advance the balanced beta breaking-change path, with one reviewer focus: confirm that OpenAB is comfortable removing inheritEnv now rather than keeping a deprecated alias for one release.

The proposed [agent.clear_env] model is specific, testable, and solves the AWS IRSA deployment problem without defaulting to unsafe inheritance. For merge discussion, ask reviewers to validate the naming and migration stance, then consider splitting runtime observability into a follow-up: log the selected inheritance mode and inherited key count at debug level without exposing secret values.

@chaodu-agent
Copy link
Copy Markdown
Collaborator

Review Summary

LGTM on the core feature ✅ — replacing inherit_env with structured [agent.clear_env] is a clear improvement. One blocking change required: remove deny_list.

🔴 BLOCKING — Remove deny_list

We are not planning to add deny_list to OAB. The deny-list model is fail-open by design: any secret NOT listed leaks to the agent subprocess. In Kubernetes, the set of env vars injected into a pod is not fully under the operator's control (mutation webhooks, sidecars, platform controllers), so the operator cannot guarantee completeness.

For K8s/EKS use cases with dynamically injected env vars (e.g. AWS-IRSA), please consider using an init container to generate a dynamic allow-list at pod startup:

initContainers:
  - name: env-allowlist
    image: busybox
    command: ["sh", "-c", "env | grep -v 'TOKEN\|SECRET\|PASSWORD\|API_KEY' | cut -d= -f1 > /shared/agent-env-allowlist.txt"]
    volumeMounts:
      - name: shared
        mountPath: /shared

A follow-up allow_list_file feature can be opened as a separate issue to read from this file. This keeps the security model fail-closed while solving the dynamic-injection use case.

Requested changes:

  1. Remove deny_list from ClearEnvConfig, Helm values, configmap template, tests, docs, and AGENTS.md
  2. Remove the deny_list branch from the decision tree in build_agent_env()
  3. Keep allow_list and enabled = false (escape hatch) as-is — both are well-designed
Initial Review (四問框架)

1. What problem does this solve?

The existing inherit_env is allow-list-only, which is brittle for AWS-IRSA / web-identity workloads where Kubernetes auto-injects many AWS_* env vars. Operators need a structured config to control env inheritance.

2. How does it solve it?

Replaces inherit_env: Vec<String> with ClearEnvConfig { enabled, allow_list }:

  • allow_list takes priority; [agent].env always wins on key conflict
  • Baseline vars (HOME/PATH/USER) always injected regardless of policy
  • Helm template hard-fails on legacy inheritEnv with migration message

3. What was considered?

  • Breaking change is acceptable in beta — hard-fail ensures no silent regression
  • Decision tree is well-documented in code, config example, docs, and AGENTS.md
  • Tests cover: explicit precedence, allow-list mode, enabled=false escape hatch, empty-lists-inherits-nothing, missing-var skip

4. Is this the best approach?

Yes, once deny_list is removed. The two-axis model (enabled + allow_list) is clean and fail-closed.

Traffic Light

🟢 INFO — Decision tree is clear and consistently documented across all 4 locations (code comment, config.toml.example, docs/config-reference.md, AGENTS.md).

🟢 INFO — Helm hard-fail on legacy inheritEnv is excellent — breaking changes should fail loudly, not silently regress.

🟢 INFO — Test coverage is thorough for the allow-list and escape-hatch branches.

🟡 NIT — Naming: clear_env.enabled = false reads as a double negative. Consider whether inherit_all_env or env_filter would be clearer. Not blocking.

🟡 NIT — PR description is missing the Prior Art section required by the RFC: PR Contribution Guidelines.

🟡 NIT — Missing CHANGELOG entry for the breaking change (inherit_envclear_env).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agent): allow-list + deny-list filter for env inheritance ([agent.clear_env])

4 participants