Please read this first
- Have you read the docs? Yes.
- Have you searched for related issues? Yes. I searched upstream issues and PRs for
archive extraction size limit zip tar and did not find a direct duplicate.
Describe the bug
Sandbox archive extraction validates paths, links, duplicate paths, and non-directory parents, but it does not cap the archive input size, member count, or declared extracted-byte total before writing extracted payloads.
This leaves extract_archive() exposed to untrusted zip/tar archives that contain excessive members or expand to excessive payload size.
Debug information
- Agents SDK version:
main@683b6e7
- Python version: Python 3.12.1
Repro steps
Run this from the repository root:
import asyncio
import io
import tempfile
import zipfile
from pathlib import Path
from agents.sandbox.manifest import Manifest
from agents.sandbox.sandboxes.unix_local import (
UnixLocalSandboxSession,
UnixLocalSandboxSessionState,
)
from agents.sandbox.snapshot import NoopSnapshot
def zip_with_many_members(count: int) -> io.BytesIO:
buf = io.BytesIO()
with zipfile.ZipFile(buf, mode="w") as archive:
for index in range(count):
archive.writestr(f"files/{index}.txt", b"x")
buf.seek(0)
return buf
async def main():
root = Path(tempfile.mkdtemp()) / "workspace"
session = UnixLocalSandboxSession.from_state(
UnixLocalSandboxSessionState(
manifest=Manifest(root=str(root)),
snapshot=NoopSnapshot(id="noop"),
)
)
await session.start()
try:
await session.extract("bundle.zip", zip_with_many_members(10_001))
finally:
await session.shutdown()
print("extracted members:", len(list((root / "files").iterdir())))
asyncio.run(main())
Actual result on current main: extraction proceeds and writes all members without a resource-limit error.
Expected behavior
Archive extraction should enforce finite limits before writing extracted payloads, including:
- maximum archive input bytes
- maximum archive member count
- maximum declared extracted bytes
When any limit is exceeded, extraction should fail before writing member payloads.
Please read this first
archive extraction size limit zip tarand did not find a direct duplicate.Describe the bug
Sandbox archive extraction validates paths, links, duplicate paths, and non-directory parents, but it does not cap the archive input size, member count, or declared extracted-byte total before writing extracted payloads.
This leaves
extract_archive()exposed to untrusted zip/tar archives that contain excessive members or expand to excessive payload size.Debug information
main@683b6e7Repro steps
Run this from the repository root:
Actual result on current
main: extraction proceeds and writes all members without a resource-limit error.Expected behavior
Archive extraction should enforce finite limits before writing extracted payloads, including:
When any limit is exceeded, extraction should fail before writing member payloads.