`asyncio.Lock` in `messaging.py` not released on client disconnect → cascading timeouts

# `asyncio.Lock` in `messaging.py` not released on client disconnect → cascading timeouts

## Describe the bug

When a client disconnects from a code execution request (e.g., due to SDK timeout), the per-context `asyncio.Lock()` in `template/server/messaging.py` remains held until the Jupyter kernel finishes the execution. All subsequent code executions on the same context block behind this orphaned lock, causing a cascade of timeouts.

The only recovery is calling `POST /contexts/{id}/restart`, which creates a new `ContextWebSocket` with a fresh lock. But this clears all kernel state (variables, imports), which is a heavy penalty for what should be a recoverable situation.

## Root Cause

In [`template/server/messaging.py`](https://github.com/e2b-dev/code-interpreter/blob/main/template/server/messaging.py), the `ContextWebSocket.execute()` method holds an `asyncio.Lock()` for the entire duration of code execution, including streaming results back:

```python
class ContextWebSocket:
    def __init__(self, context_id, session_id, language, cwd):
        self._lock = asyncio.Lock()

    async def execute(self, code, env_vars, access_token):
        async with self._lock:              # Lock acquired here
            await self._ws.send(request)     # Send to Jupyter kernel
            async for item in self._wait_for_result(message_id):
                yield item                   # Stream results while holding lock
```

The HTTP endpoint in `template/server/main.py` wraps this generator in a streaming response:

```python
@app.post("/execute")
async def post_execute(request, exec_request):
    return StreamingListJsonResponse(
        ws.execute(code, env_vars, access_token)
    )
```

When the SDK client times out (default 300s) and closes the HTTP connection, FastAPI/Starlette abandons the streaming generator. However, the `asyncio.Lock` is still held inside the generator's frame — it only releases when:
1. The kernel finishes execution and `_wait_for_result()` hits `EndOfExecution`, OR
2. The generator is garbage collected (non-deterministic), OR  
3. The context is restarted via `POST /contexts/{id}/restart`

## The Cascade

```
T=0:00    Client sends long-running code → lock acquired → kernel executing
T=5:00    SDK timeout (300s) → client HTTP disconnect → lock STILL held
T=5:01    Client retries with new code → blocked on lock
T=10:01   Retry also times out → next retry blocked behind both
          ... each retry adds another timeout duration of queue time ...
```

## envd Log Evidence

We observed this directly via `journalctl -u envd` logs inside a sandbox. Normal executions show sequential lock acquire/release:

```
06:14:05  Execution abc... finished [LOCK RELEASED]
06:14:05  Input accepted for def... [LOCK ACQUIRED]
06:14:15  Execution def... finished [LOCK RELEASED]
```

During a deadlock incident, we captured:

```
05:43:45  d8a56844 → Sending code (SessionView.from_id)
05:43:45  d8a56844 → Input accepted [LOCK ACQUIRED]
          ... kernel running for ~5 min (large HTTP request) ...
≈05:48:45 SDK timeout fires (300s) → client disconnects
          *** 4.6 minutes of TOTAL SILENCE — all executions blocked ***
05:53:23  Next execution finally gets lock [after kernel finished internally]
```

Execution `d8a56844` has "Sending code" + "Input accepted" but **no** "finished execution" event — a confirmed orphaned lock holder. Total lock hold: 578s (9.6 min), of which 278s (4.6 min) was orphaned after the client disconnected.

## Suggested Fix

Instead of holding the lock for the entire generator lifetime, consider one of:

1. **Release the lock after sending to the kernel** — The lock's purpose is to serialize sends to the Jupyter kernel WebSocket. Once the message is sent and accepted, the lock could be released. Streaming results doesn't require the lock since `_wait_for_result` reads from a per-execution queue.

2. **Add a `finally` clause to release on generator close** — Wrap the lock acquisition so that when the generator is closed (by FastAPI on client disconnect), the lock is explicitly released:
   ```python
   async def execute(self, code, env_vars, access_token):
       await self._lock.acquire()
       try:
           await self._ws.send(request)
           async for item in self._wait_for_result(message_id):
               yield item
       finally:
           self._lock.release()
   ```
   Note: `async with self._lock` inside an async generator may not trigger `__aexit__` on generator close in all Python versions.

3. **Add a lock timeout** — Use `asyncio.wait_for(self._lock.acquire(), timeout=N)` so blocked executions fail fast instead of cascading.

## Impact

This affects any SDK user whose code execution exceeds the SDK timeout. In our production environment:
- One session had **8 consecutive timeouts** — even `print('hello')` was blocked
- Another had **16 consecutive timeouts** before full sandbox destruction recovered it
- Our workaround: call `restartCodeContext()` on every timeout, but this clears kernel state

## Related Issues

- e2b-dev/E2B#1017 — "Unstable service" / sandbox hangs (closed with a fix in Dec 2025, but this specific lock issue persists)
- e2b-dev/E2B#1034 — `kill()` blocks, process keeps running after disconnect
- e2b-dev/E2B#1128 — `commands.run` hangs indefinitely, no read timeout on streams

## Environment

- E2B SDK: `@e2b/code-interpreter` v1.x (JS/TS)
- Python version inside sandbox: 3.12
- Affected file: `template/server/messaging.py` (deployed to sandbox as `/root/.server/messaging.py`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`asyncio.Lock` in `messaging.py` not released on client disconnect → cascading timeouts #213

`asyncio.Lock` in `messaging.py` not released on client disconnect → cascading timeouts

Describe the bug

Root Cause

The Cascade

envd Log Evidence

Suggested Fix

Impact

Related Issues

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

asyncio.Lock in messaging.py not released on client disconnect → cascading timeouts #213

Description

asyncio.Lock in messaging.py not released on client disconnect → cascading timeouts

Describe the bug

Root Cause

The Cascade

envd Log Evidence

Suggested Fix

Impact

Related Issues

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`asyncio.Lock` in `messaging.py` not released on client disconnect → cascading timeouts #213

`asyncio.Lock` in `messaging.py` not released on client disconnect → cascading timeouts