Skip to content

RLMEnv: Simplify constructor and internals#966

Open
snimu wants to merge 4 commits intomainfrom
sebastian/rlm-args-reduction-2026-02-27
Open

RLMEnv: Simplify constructor and internals#966
snimu wants to merge 4 commits intomainfrom
sebastian/rlm-args-reduction-2026-02-27

Conversation

@snimu
Copy link
Contributor

@snimu snimu commented Feb 27, 2026

Description

  • Remove 14 unused/niche constructor args that were silently swallowed via **kwargs or had no
    remaining use case (interception_host, interception_port, interception_url, execution_backend,
    context_key, sandbox_start_command, sandbox_client_max_workers, root_tool_serialization,
    stagger/jitter params, etc.)
  • Remove _InterceptionPool singleton and all shared-pool branching — each RLMEnv instance now owns
    its own interception server and tunnel (this undoes a recent change by myself which was poorly motivated and thought through)
  • Add explicit max_turns: int = 50 constructor param (previously inherited a default of 10 from
    StatefulToolEnv, easily lost via **kwargs)
  • Rename sub_tool_max_turnssub_llm_max_turns for consistency with max_sub_llm_parallelism and the
    sub_llm_* metric names
  • Hardcode interception_port=0 (OS-assigned) and bind_host="127.0.0.1" — the old configurability only
    mattered for the now-removed pool
  • Update docs and docstring to remove outdated claims

Note: requires small changes to the -rlm environments.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

High Risk
Breaking API changes remove/rename multiple RLMEnv constructor args and change interception/tunnel lifecycle to be per-instance (no shared pool), which can affect rollout networking and resource usage. Touches sandbox execution, proxy routing, and sub-LLM timeouts, so regressions could impact core execution paths.

Overview
Simplifies RLMEnv configuration and removes shared interception infrastructure. The constructor is pared down (adds explicit max_turns, renames sub_tool_max_turns to sub_llm_max_turns, removes max_iterations and many other knobs), and context ingestion is now fixed to info["context_dir"]/info["context"] (drops configurable keys).

Interception/tunnel logic is simplified by deleting _InterceptionPool and related branching: each RLMEnv now owns its own aiohttp interception server and Prime Tunnel, with interception_port always ephemeral and bind host fixed to 127.0.0.1 (tests use a private _interception_url_override to skip tunneling).

Sub-LLM execution is streamlined: removes stagger/jitter delays, collapses sub-LLM timeouts to a single sub_llm_timeout derived from code_execution_timeout, and hardcodes root-tool serialization to pickle (removes serialization handling). Context directory copy now enforces a fixed 1GB limit internally (removes configurable filesystem_copy_max_bytes). Docs and tests are updated accordingly, including removal of _InterceptionPool tests and updated fixtures/expectations.

Written by Cursor Bugbot for commit 9e27e18. This will update automatically on new commits. Configure here.

snimu and others added 4 commits February 27, 2026 13:53
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

sandbox_id,
cmd,
timeout=self.env._compute_install_wait_seconds(),
timeout=self.env.max_startup_wait_seconds,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pip install timeout no longer scales with packages

Low Severity

The removed _compute_install_wait_seconds() scaled the pip install timeout based on the number of packages (30s per package, minimum max_startup_wait_seconds). Now using the flat max_startup_wait_seconds (default 120s) means environments with many pip_install_packages (5+) may time out during installation where they previously succeeded.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If somebody installs that many packages, they know what they're doing, and should be able to simply increase the max_startup_wait_seconds..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant