Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| total = len(self._tunnels) | ||
| logger.debug(f"Health monitor: {alive}/{total} tunnels alive") | ||
| except asyncio.CancelledError: | ||
| return |
There was a problem hiding this comment.
Health monitor silently dies on tunnel restart failure
Medium Severity
_tunnel_health_monitor only catches asyncio.CancelledError. If tunnel.sync_stop() or new_tunnel.start() raises any other exception (e.g., network failure, permission error), the exception propagates out of the loop and the monitor task dies silently with no logging. Since the task is fire-and-forget, the exception is swallowed. Tunnel monitoring stops entirely until the next get_gateway_tunnel_url call happens to restart it, leaving a potentially long window with no proactive dead-tunnel detection.
|
|
||
| async def get_gateway_tunnel_url(self, local_addr: str | None = None) -> str: | ||
| """Get gateway tunnel URL, starting the tunnel if needed.""" | ||
| """Get gateway tunnel URL, starting the tunnel if needed. Restarts dead tunnels.""" |
There was a problem hiding this comment.
Dead tunnel not detected when local_addr is None
Low Severity
get_gateway_tunnel_url with local_addr=None and exactly one tunnel returns the tunnel's URL without checking is_running. The dead-tunnel detection and restart logic was only added to the local_addr is not None code path, so this convenience path can return a stale URL from a dead tunnel, contradicting the docstring's claim that it "Restarts dead tunnels."
| class TunnelError(InfraError): | ||
| """Raised when a tunnel process dies or becomes unreachable.""" | ||
|
|
||
| pass |
There was a problem hiding this comment.
New TunnelError not added to documented error hierarchy
Low Severity
The new vf.TunnelError class (a subclass of vf.InfraError) is a user-facing error that can be raised during rollouts in both CliAgentEnv and RolloutGatewayMixin, but the documented error hierarchy in docs/environments.md only lists vf.SandboxError as an example under vf.InfraError. Since TunnelError is exported from the package and users may want to handle tunnel failures distinctly (e.g., for retry logic), it warrants mention alongside vf.SandboxError.
Triggered by project rule: BugBot Instructions


Description
monitors and restarts tunnels that have died in
CliAgentEnvandRolloutGatewayEnvdepends on PrimeIntellect-ai/prime#403
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Changes add automatic tunnel restarts and new failure paths (
TunnelError) during rollouts, which can alter runtime behavior and error handling in production environments. Risk is moderate because it touches long-running background tasks and sandbox rollout completion loops.Overview
Improves tunnel robustness by detecting dead
prime_tunnel.Tunnelprocesses and recreating/restarting them in bothCliAgentEnv.get_tunnel_url()andRolloutGatewayMixin.get_gateway_tunnel_url()(including richer logging withtunnel_idand recentfrpcoutput).In gateway mode, adds a background
_tunnel_health_monitortask that periodically restarts dead tunnels, cancels it duringteardown_gateway(), and raises a newvf.TunnelErrorfrompoll_job_completion()when a tunnel dies mid-rollout; the interception path now also raisesTunnelErrorwhen the tunnel dies while waiting for agent requests.Adds integration tests covering dead-tunnel recreation, health-monitor restarts, teardown cancellation, and the new error behavior, plus introduces
TunnelErrorunderverifiers/errors.py.Written by Cursor Bugbot for commit 3eb0fc5. This will update automatically on new commits. Configure here.