Skip to content

Conversation

@w31r4
Copy link

@w31r4 w31r4 commented Jan 27, 2026

概述

本次重构实现了三大核心功能,为 AI Agent 提供完整的沙箱执行环境:

1. Session-First 架构重构

  • 1:1 Session-Ship 绑定: 每个会话独占一个容器,简化状态管理
  • Warm Pool 预热池: 弥补冷启动延迟,提升用户体验
  • 执行历史记录: 支持 Agent 技能库构建(受 VOYAGER 论文启发)

2. MCP Server 集成

  • 标准 MCP 协议: 兼容 Claude Desktop、ChatGPT、Cursor、VS Code 等客户端
  • 双传输模式: stdio(桌面应用)和 HTTP(远程部署)
  • HTTP 模式会话隔离: 每个 MCP 客户端获得独立的 Sandbox
  • npm 包发布: shipyard-mcp 支持快速安装

3. SDK 架构优化

  • 新增 Sandbox 类: 统一简洁的入口接口
  • MCP Server 内部复用 SDK: 避免代码重复
  • 保留 ShipyardClient: 作为低级 API 供高级用户使用

4. 技能库增强(Skill Library)

基于 VOYAGER、Reflexion、LearnAct 论文需求,新增执行记录管理功能:

新增 MCP 工具:

  • get_execution(execution_id) - 按 ID 查询执行记录
  • get_last_execution(exec_type) - 获取最近执行
  • annotate_execution(execution_id, description, tags, notes) - 标注执行记录

增强的工具参数:

  • execute_python/execute_shell: 新增 include_code, description, tags
  • get_execution_history: 新增 tags, has_notes, has_description 过滤

数据模型扩展:

  • ExecutionHistory 新增 description, tags, notes 字段
  • ExecResult 新增 execution_id 字段

文件变更

目录 变更
pkgs/bay/ Bay 服务端核心逻辑、MCP Server、数据库模型
pkgs/mcp-server/ 独立 npm 包 MCP Server
pkgs/ship/ 执行时间记录支持
shipyard_python_sdk/ 新增 Sandbox 类、Session-First API
docs/ CHANGELOG 文档

测试

  • 13 个 MCP HTTP 会话隔离集成测试全部通过
  • 单元测试更新以匹配新架构

参考文献

  • VOYAGER (2023) - 技能库自动构建
  • Reflexion (2023) - 执行反馈用于 Agent 自我改进
  • LearnAct (2024) - 从执行历史中学习可复用技能

🤖 Generated with Claude Code

w31r4 added 17 commits January 27, 2026 16:58
Refactor the MCP server to use the official FastMCP SDK, providing a
robust integration with comprehensive tools for sandbox interaction.

Key changes:
- Replace custom adapter with FastMCP implementation
- Implement tools for Python/Shell execution and filesystem operations
- Add `sandbox://info` resource
- Support both stdio and HTTP transports via CLI arguments
- Add `shipyard-mcp` CLI entry point
- Add detailed MCP documentation and session-first architecture doc
Introduces a new package `@anthropic/shipyard-mcp` containing a Python-based Model Context Protocol (MCP) server that acts as a bridge to the Shipyard Bay API.

- Implement core server logic in Python with tools for code execution and file operations
- Create Node.js CLI wrapper (`shipyard-mcp`) for seamless distribution and launching
- Support stdio transport protocol for integration with AI clients
- Add comprehensive documentation including configuration guides for Claude Desktop, Cursor, and VS Code
Introduces the `Sandbox` class in the Python SDK for simplified container management and code execution. Refactors both the internal Bay MCP server and the standalone Python MCP server to use this new interface.

Changes include:
- Added `shipyard.Sandbox` with Python, Shell, and Filesystem components
- Updated MCP servers to delegate operations to the SDK
- Added `get_execution_history` tool for retrieving session history
- Included fallback SDK implementation for standalone MCP server
Update session changelog to document recent architectural changes:
- Detail the new `Sandbox` class as the primary SDK entry point.
- Update MCP server architecture, component responsibilities, and tool
  definitions.
- Correct the npm package name to `shipyard-mcp`.
- Add usage examples for the simplified SDK interface.
Enable multi-client support by creating isolated Sandbox instances for
each MCP session when running in HTTP transport mode. Previously, all
clients shared a single Sandbox instance, leading to potential state
pollution.

Key changes:
- Implement `get_or_create_sandbox` with session-based locking
- Store Sandbox instances in FastMCP session state
- Add automatic TTL renewal on tool activity and re-creation on expiry
- Add `SHIPYARD_SANDBOX_TTL` environment variable (default: 30m)
- Update standalone server to prefer FastMCP implementation when available
- Document isolation architecture in changelog
- Add comprehensive test suite to verify session isolation, shared state within sessions, and variable isolation between clients in HTTP mode
- Update pyproject.toml to correctly structure mcp optional dependencies and refresh uv.lock
- Update README with detailed documentation on transport modes (stdio vs http), session isolation architecture, and CLI options
Add SHIPYARD_SANDBOX_TTL env var, document stdio vs HTTP behavior with
per-session sandbox isolation, and include sandbox expiry troubleshooting.
Update tool list with get_execution_history and adjust test command to
use uv run pytest.
Add description/tags/notes to execution history (with Alembic migration) and
propagate metadata through python/shell execution calls.

Expose execution_id in exec responses, add get_execution/get_last_execution and
annotate_execution via HTTP routes and MCP tools, and extend the SDK to support
retrieval and annotation of execution records.
Extend execution history APIs and MCP tools to support filtering by
comma-separated tags and by presence of notes/description, and include
basic metadata in formatted history output.
Update MCP docs and session-first changelog with skill library support,
including new execution lookup/annotation tools, execute_* metadata
parameters, and enhanced history filtering options.
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w31r4, your pull request is larger than the review limit of 150000 diff characters

@w31r4
Copy link
Author

w31r4 commented Jan 27, 2026

Code review

Found 1 issue:

  1. FastAPI Route Ordering Bug - /sessions/{session_id}/history/last is defined after /sessions/{session_id}/history/{execution_id}. Since FastAPI matches routes in definition order, requests to /history/last will be matched by the parameterized route first (treating "last" as an execution_id), so the /history/last endpoint will never be reached.

@router.get("/sessions/{session_id}/history/{execution_id}", response_model=ExecutionHistoryEntry)
async def get_execution_by_id(
session_id: str,
execution_id: str,
token: str = Depends(verify_token),
):
"""Get a specific execution record by ID.
Args:
session_id: The session ID
execution_id: The execution history ID
"""
entry = await db_service.get_execution_by_id(session_id, execution_id)
if not entry:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Execution not found"
)
return ExecutionHistoryEntry.model_validate(entry)
@router.get("/sessions/{session_id}/history/last", response_model=ExecutionHistoryEntry)
async def get_last_execution(
session_id: str,
exec_type: Optional[str] = None,

Fix: Move the /history/last route definition before /history/{execution_id} so specific routes are matched before parameterized ones.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Move the /history/last route definition before /history/{execution_id}
to ensure literal paths are matched before parameterized ones.

Previously, requests to /history/last were incorrectly matched by
the parameterized route, treating "last" as an execution_id.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@RC-CHN RC-CHN changed the base branch from main to dev January 28, 2026 01:09
@RC-CHN RC-CHN merged commit 356727b into AstrBotDevs:dev Jan 28, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants