Skip to content

fix: raise ResourceNotExistError in delete_sandbox when data plane returns "sandbox not found"#89

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-terminated-sandbox-issue
Draft

fix: raise ResourceNotExistError in delete_sandbox when data plane returns "sandbox not found"#89
Copilot wants to merge 3 commits intomainfrom
copilot/fix-terminated-sandbox-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 20, 2026

Control plane (list_sandboxes) and data plane (delete_sandbox) can diverge: a TERMINATED sandbox remains visible in list results, but the data plane has already removed it, causing delete_sandbox to raise ClientError('Failed to stop sandbox: sandbox not found') instead of a recognisable "not found" signal.

Changes

  • agentrun/sandbox/client.py — In both delete_sandbox and delete_sandbox_async, when the data plane returns a non-SUCCESS response with a message containing "sandbox not found", raise ResourceNotExistError("Sandbox", sandbox_id) instead of ClientError. This is consistent with the existing HTTP 404 path and avoids returning an incomplete Sandbox object. Updated docstrings document that ResourceNotExistError covers both HTTP 404 and data-plane business-level not-found. A code comment tracks the long-term goal of switching to a stable server-side error_code (e.g. SandboxNotFound).

  • tests/unittests/sandbox/test_client.py — Updated tests to assert ResourceNotExistError is raised for the data-plane not-found case; added boundary and negative tests: case-insensitive message variants, unrelated failure messages still raise ClientError, and empty message raises ClientError.

Before / After

# Before — raises generic ClientError on already-gone TERMINATED sandbox
client.delete_sandbox("91d970881b8d4dcaa5636df64577545b--18ee62652b")
# ClientError: Failed to stop sandbox: sandbox not found

# After — raises ResourceNotExistError (consistent with HTTP 404 path)
# Callers can implement idempotent deletion with a simple except clause:
try:
    client.delete_sandbox("91d970881b8d4dcaa5636df64577545b--18ee62652b")
except ResourceNotExistError:
    pass  # already gone — idempotent

Copilot AI changed the title [WIP] Fix list_sandboxes returning TERMINATED instances when deleting fix: make delete_sandbox idempotent when data plane returns "sandbox not found" Apr 20, 2026
Copilot AI requested a review from OhYee April 20, 2026 16:08
@OhYee
Copy link
Copy Markdown
Member

OhYee commented May 6, 2026

@copilot 修复方向正确(解决 issue #83 的 TERMINATED 残留场景),但建议对实现做一个关键调整。

问题:当前实现把 not-found 静默吞为成功

if "sandbox not found" in message.lower():
    return Sandbox.model_validate(
        {"sandboxId": sandbox_id}, by_alias=True
    )

这有三个隐患:

  1. 吞掉真错误:用户传错 sandbox_id(笔误、跨账号、复制错)时也会"删除成功",bug 隐藏
  2. 返回的 Sandbox 残缺:只有 sandbox_id,其他字段(status/template_name 等)拿默认值,与正常成功响应 shape 不一致
  3. 行为分裂:HTTP 404 路径已经抛 ResourceNotExistError("Sandbox", id)(见 client.pyexcept ClientError as e: if e.status_code == 404),业务码 not-found 静默成功,同语义两种返回

建议:改抛 ResourceNotExistError,与 HTTP 404 路径统一

代码改动(sync + async 两处)

                 message = result.get("message", "")
                 if "sandbox not found" in message.lower():
-                    return Sandbox.model_validate(
-                        {"sandboxId": sandbox_id}, by_alias=True
-                    )
+                    raise ResourceNotExistError("Sandbox", sandbox_id)
                 raise ClientError(
                     status_code=0,
                     message=(
                         "Failed to stop sandbox:"
                         f" {message or 'Unknown error'}"
                     ),
                 )

调用方写法(依然能实现幂等)

try:
    client.delete_sandbox(sid)
except ResourceNotExistError:
    pass  # 已经不在了,幂等达成

优点

还需要做的事

1. 更新 docstring(sync + async)

Raises: 段补充说明:

Raises:
    ResourceNotExistError: Sandbox 不存在(包括 HTTP 404 与数据面业务层 not-found
        两种情形)。调用方可 catch 此异常实现幂等删除。
    ClientError: 客户端错误
    ServerError: 服务器错误

2. 改造现有测试

test_delete_sandbox_not_found_in_response_is_idempotent 改为断言抛 ResourceNotExistError

def test_delete_sandbox_not_found_in_response_raises_resource_not_exist(...):
    mock_data_api.delete_sandbox.return_value = {
        "code": "FAILED",
        "message": "sandbox not found",
    }
    client = SandboxClient()
    with pytest.raises(ResourceNotExistError):
        client.delete_sandbox("sandbox-123")

async 版同理。

3. 补负样本与边界测试

  • 负样本:message 含 "sandbox not running""sandbox is busy" 等其他文本,应仍抛 ClientError(不该被误吞)
  • 大小写:message 为 "Sandbox NOT FOUND""SANDBOX Not Found" 等变体,验证 .lower() 工作正常
  • 边界:message 为空字符串或字段缺失时,应抛 ClientError(不该匹配)

4. 转 Ready for review

base 已能 cleanly merge 到当前 main(含 #95),无需 rebase;merge 后 71 个 sandbox client 测试本地全过。


关于服务端

字符串匹配 "sandbox not found" in message.lower() 是技术债——服务端文案变更即失效。建议在 PR 描述或 issue #83显式跟踪:"长期方案是服务端返回稳定 error_code(如 SandboxNotFound),SDK 切换到按 code 判断"。目前 SDK 字符串补丁是合理的短期容错。

修复以上 4 项后即可合入。

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 6, 2026

@copilot 修复方向正确(解决 issue #83 的 TERMINATED 残留场景),但建议对实现做一个关键调整。

问题:当前实现把 not-found 静默吞为成功...

Done in eec1443. Changes made:

  1. client.py (sync + async) — replaced the stub-return with raise ResourceNotExistError("Sandbox", sandbox_id), aligning the data-plane not-found path with the existing HTTP 404 path. Updated docstrings to document both cases under Raises:, and added a comment noting the long-term plan of switching to a stable server-side error_code.

  2. Tests — renamed/rewrote the idempotent tests to assert ResourceNotExistError; added:

    • case-insensitive variant ("Sandbox NOT FOUND"ResourceNotExistError)
    • negative sample ("sandbox is busy"ClientError)
    • empty message edge case (""ClientError)
    • async equivalents for failure/not-found paths

All 11 TestSandboxClientDeleteSandbox tests pass; type check clean (326 files, 0 errors).

Copilot AI changed the title fix: make delete_sandbox idempotent when data plane returns "sandbox not found" fix: raise ResourceNotExistError in delete_sandbox when data plane returns "sandbox not found" May 6, 2026
Copilot finished work on behalf of OhYee May 6, 2026 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【bug】AgentRun Python SDK 中 list_sandboxes 仍能查到 TERMINATED 实例,但 delete_sandbox 删除同一 sandbox_id 时返回 sandbox not found

2 participants