Skip to content

feat: filesystem grep, read, edit file#7402

Draft
Soulter wants to merge 1 commit intomasterfrom
feat/fs-grep-read-edit
Draft

feat: filesystem grep, read, edit file#7402
Soulter wants to merge 1 commit intomasterfrom
feat/fs-grep-read-edit

Conversation

@Soulter
Copy link
Copy Markdown
Member

@Soulter Soulter commented Apr 6, 2026

Modifications / 改动点

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add filesystem tools for searching, reading, and editing files across local, sandbox, and Shipyard runtimes, with user-aware access restrictions and pagination support.

New Features:

  • Introduce a read-file tool that supports offsets and limits for partial file reads.
  • Introduce a file-edit tool that performs string replacements in files with optional replace-all behavior.
  • Introduce a grep-style search tool for querying file contents using ripgrep or grep with context and result limiting.
  • Add workspace path handling to scope filesystem operations per user/session.

Enhancements:

  • Extend filesystem abstraction and booters (local, shipyard, shipyard_neo, boxlite) to support search and edit operations in addition to basic CRUD.
  • Wire the new filesystem tools into local and sandbox toolsets so agents can use them alongside existing shell and Python tools.
  • Relax local booter path restrictions while enforcing read-only directory restrictions at the tool level for non-admin users.

Build:

  • Add python-ripgrep as a project dependency for content searching.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new filesystem tools—ReadFileTool, FileEditTool, and GrepTool—and updates the local and sandbox booters to support these operations. It also implements a security layer to restrict file access for non-admin users in local environments. The review feedback highlights potential memory issues when reading or editing large files in their entirety and suggests applying the documented default limit for file reads to prevent excessive memory consumption.

Comment on lines 193 to 194
with open(abs_path, "rb") as f:
raw_content = f.read()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Reading the entire file into memory using f.read() is inefficient and risky for large files, potentially leading to Out-Of-Memory (OOM) errors. Since the tool supports offset and limit, consider reading the file in chunks or using f.seek() if the encoding allows, to only load the required portion of the file.

limit: int | None = None,
) -> dict[str, Any]:
_ = encoding
content = await self._sandbox.filesystem.read_file(path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation reads the full content of the file from the sandbox into the bot's memory before slicing. This is inefficient for large files. If the Shipyard Neo SDK supports range-based reads, they should be used here to fetch only the requested slice.

Comment on lines +241 to +242
with open(abs_path, encoding=encoding) as f:
content = f.read()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to read_file, reading the entire file into memory for string replacement can be problematic for large files. For better scalability, consider processing the file line-by-line or in chunks and writing to a temporary file.

before_context=before_context,
line_number=True,
)
return {"success": True, "content": "".join(results)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Joining all search results into a single string without any limit can consume significant memory if there are many matches. It would be safer to pass a result limit to the search function or truncate the results before joining.

Comment on lines +203 to +211
def _validate_read_window(
offset: int | None,
limit: int | None,
) -> tuple[int | None, int | None]:
if offset is not None and offset < 0:
raise ValueError("`offset` must be greater than or equal to 0.")
if limit is not None and limit < 1:
raise ValueError("`limit` must be greater than or equal to 1.")
return offset, limit
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The limit parameter in ReadFileTool is documented to default to 4000, but this default is not applied in the validation logic. If the LLM does not provide a limit, the booter will receive None and attempt to read the entire file. Applying the default here ensures consistent behavior and protects against large reads.

Suggested change
def _validate_read_window(
offset: int | None,
limit: int | None,
) -> tuple[int | None, int | None]:
if offset is not None and offset < 0:
raise ValueError("`offset` must be greater than or equal to 0.")
if limit is not None and limit < 1:
raise ValueError("`limit` must be greater than or equal to 1.")
return offset, limit
def _validate_read_window(
offset: int | None,
limit: int | None,
) -> tuple[int | None, int | None]:
if offset is not None and offset < 0:
raise ValueError("offset must be greater than or equal to 0.")
if limit is not None and limit < 1:
raise ValueError("limit must be greater than or equal to 1.")
return offset, limit if limit is not None else 4000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant