Skip to content

Commit 68f6d3c

Browse files
sjarmakclaude
andcommitted
docs: document sweap-images Daytona incompatibility + add rehost script
21 tasks using jefzda/sweap-images from Docker Hub fail on Daytona because the remote builder returns "unauthorized" for Docker Hub pulls. Added rehost_sweap_images.py to re-host to GHCR when a token with write:packages scope is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9918c03 commit 68f6d3c

File tree

4 files changed

+133
-3
lines changed

4 files changed

+133
-3
lines changed

docs/DAYTONA.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -249,14 +249,15 @@ runs/daytona/{run_id}/
249249

250250
## Task Readiness
251251

252-
All 294 tasks across 20 suites are Daytona-ready. Base images are sourced from:
252+
273 of 294 tasks are Daytona-ready. Base images are sourced from:
253253

254254
- **Standard public images** (197 tasks): `python:*`, `golang:*`, `gcc:*`, `ubuntu:*`, etc.
255255
- **Pre-built repo images** (70 tasks): `ghcr.io/sourcegraph/ccb-repo-*` on GHCR
256-
- **SWEAP images** (21 tasks): `jefzda/sweap-images:*` on Docker Hub
257256
- **TAC images** (4 tasks): `ghcr.io/theagentcompany/*` on GHCR
258257
- **Linux kernel tasks** (5 tasks): Build from `gcc:13` with inline kernel source clone
259258

259+
**Not Daytona-compatible** (21 tasks): Tasks using `jefzda/sweap-images:*` from Docker Hub (9 ccb_debug + 12 ccb_fix) fail because Daytona's remote builder cannot pull from Docker Hub (returns `unauthorized` error even for public images). Run these tasks locally with Docker instead. To re-host them to GHCR, use `scripts/rehost_sweap_images.py` with a GITHUB_TOKEN that has `write:packages` scope.
260+
260261
Regenerate the task registry after adding new tasks:
261262
```bash
262263
python3 scripts/build_daytona_registry.py
@@ -287,4 +288,6 @@ python3 scripts/build_daytona_registry.py
287288

288289
**Harbor + Daytona: sandbox not found**: Ensure `daytona-sdk` is installed in the same Python environment as Harbor. The `DaytonaEnvironment` imports from `daytona`.
289290

291+
**Docker Hub images fail with "unauthorized"**: Daytona's remote builder cannot pull `jefzda/sweap-images:*` from Docker Hub. This affects 21 SWE-bench Pro tasks (9 ccb_debug + 12 ccb_fix). Run these locally or re-host images to GHCR using `scripts/rehost_sweap_images.py`.
292+
290293
**Sandbox creation fails for tasks with `storage = "20G"` in task.toml**: Daytona has a hard 10GB per-sandbox storage limit. 39 tasks specify `storage = "20G"` and 1 specifies `"15G"`, exceeding this limit. Set `export DAYTONA_OVERRIDE_STORAGE=10240` before launching runs. This passes `--override-storage-mb 10240` to all `harbor run` commands, capping storage at 10GB. The actual Docker images are 1.5-5GB so 10GB is sufficient.

docs/ops/SCRIPT_INDEX.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,7 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
188188
- `scripts/plan_variance_runs.py` - Utility script for plan variance runs.
189189
- `scripts/push_base_images_ghcr.sh` - Utility script for push base images ghcr.
190190
- `scripts/regenerate_artifact_dockerfiles.py` - Utility script for regenerate artifact dockerfiles.
191+
- `scripts/rehost_sweap_images.py` - Utility script for rehost sweap images.
191192
- `scripts/remirror_mcp_unique_repos.sh` - Utility script for remirror mcp unique repos.
192193
- `scripts/repair_h3_trajectories.py` [one_off] - Historical one-off script: repair h3 trajectories.
193194
- `scripts/rerun_crossrepo_2tasks.sh` [one_off] - Historical one-off script: rerun crossrepo 2tasks.

scripts/registry.json

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -906,6 +906,14 @@
906906
"language": "python",
907907
"summary": "Task creation/selection script for register new mcp tasks."
908908
},
909+
{
910+
"name": "rehost_sweap_images.py",
911+
"path": "scripts/rehost_sweap_images.py",
912+
"category": "misc",
913+
"status": "maintained",
914+
"language": "python",
915+
"summary": "Utility script for rehost sweap images."
916+
},
909917
{
910918
"name": "reliability_analysis.py",
911919
"path": "scripts/reliability_analysis.py",
@@ -1243,7 +1251,7 @@
12431251
"infra_mirrors": 18,
12441252
"library_helpers": 7,
12451253
"migration": 4,
1246-
"misc": 46,
1254+
"misc": 47,
12471255
"qa_quality": 10,
12481256
"submission_reporting": 7,
12491257
"task_creation_selection": 12,

scripts/rehost_sweap_images.py

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#!/usr/bin/env python3
2+
"""Re-host jefzda/sweap-images from Docker Hub to GHCR for Daytona compatibility.
3+
4+
Daytona's remote builder cannot pull from Docker Hub (auth error on public images).
5+
This script re-hosts all sweap-images used by benchmark tasks to ghcr.io/sg-evals/sweap-images.
6+
7+
Usage:
8+
python3 scripts/rehost_sweap_images.py --dry-run # list what would be done
9+
python3 scripts/rehost_sweap_images.py --pull-push # pull+tag+push to GHCR
10+
python3 scripts/rehost_sweap_images.py --update-dockerfiles # update FROM lines
11+
python3 scripts/rehost_sweap_images.py --all # do everything
12+
"""
13+
14+
import re
15+
import subprocess
16+
import sys
17+
from pathlib import Path
18+
19+
SOURCE_REGISTRY = "jefzda/sweap-images"
20+
TARGET_REGISTRY = "ghcr.io/sg-evals/sweap-images"
21+
22+
BENCHMARKS = Path("benchmarks")
23+
24+
25+
def find_sweap_references() -> dict[str, list[Path]]:
26+
"""Find all Dockerfiles referencing sweap-images, grouped by tag."""
27+
tags: dict[str, list[Path]] = {}
28+
for df in sorted(BENCHMARKS.glob("*/*/environment/Dockerfile*")):
29+
content = df.read_text()
30+
for m in re.finditer(r"FROM jefzda/sweap-images:(\S+)", content):
31+
tag = m.group(1)
32+
tags.setdefault(tag, []).append(df)
33+
return tags
34+
35+
36+
def pull_and_push(tags: dict[str, list[Path]]) -> list[str]:
37+
"""Pull from Docker Hub, tag for GHCR, push to GHCR."""
38+
failed = []
39+
for i, tag in enumerate(sorted(tags.keys()), 1):
40+
source = f"{SOURCE_REGISTRY}:{tag}"
41+
target = f"{TARGET_REGISTRY}:{tag}"
42+
print(f"[{i}/{len(tags)}] {tag}")
43+
44+
# Pull
45+
r = subprocess.run(["docker", "pull", source], capture_output=True, text=True)
46+
if r.returncode != 0:
47+
print(f" PULL FAILED: {r.stderr.strip()}")
48+
failed.append(tag)
49+
continue
50+
print(f" Pulled {source}")
51+
52+
# Tag
53+
subprocess.run(["docker", "tag", source, target], check=True)
54+
print(f" Tagged -> {target}")
55+
56+
# Push
57+
r = subprocess.run(["docker", "push", target], capture_output=True, text=True)
58+
if r.returncode != 0:
59+
print(f" PUSH FAILED: {r.stderr.strip()}")
60+
failed.append(tag)
61+
continue
62+
print(f" Pushed {target}")
63+
64+
return failed
65+
66+
67+
def update_dockerfiles(tags: dict[str, list[Path]]) -> int:
68+
"""Update FROM lines in all affected Dockerfiles."""
69+
count = 0
70+
for tag, files in sorted(tags.items()):
71+
old = f"FROM {SOURCE_REGISTRY}:{tag}"
72+
new = f"FROM {TARGET_REGISTRY}:{tag}"
73+
for df in files:
74+
content = df.read_text()
75+
if old in content:
76+
df.write_text(content.replace(old, new))
77+
count += 1
78+
return count
79+
80+
81+
def main():
82+
dry_run = "--dry-run" in sys.argv
83+
pull_push = "--pull-push" in sys.argv
84+
update = "--update-dockerfiles" in sys.argv
85+
do_all = "--all" in sys.argv
86+
87+
if not any([dry_run, pull_push, update, do_all]):
88+
print("Usage: --dry-run | --pull-push | --update-dockerfiles | --all")
89+
sys.exit(1)
90+
91+
tags = find_sweap_references()
92+
total_files = sum(len(f) for f in tags.values())
93+
print(f"Found {len(tags)} unique sweap-images tags across {total_files} Dockerfiles\n")
94+
95+
if dry_run:
96+
for tag, files in sorted(tags.items()):
97+
print(f" {SOURCE_REGISTRY}:{tag}")
98+
print(f" -> {TARGET_REGISTRY}:{tag}")
99+
for f in files:
100+
print(f" {f}")
101+
return
102+
103+
if pull_push or do_all:
104+
print("=== Pull + Push to GHCR ===")
105+
failed = pull_and_push(tags)
106+
if failed:
107+
print(f"\nFailed tags: {failed}")
108+
else:
109+
print(f"\nAll {len(tags)} images re-hosted successfully")
110+
111+
if update or do_all:
112+
print("\n=== Updating Dockerfiles ===")
113+
count = update_dockerfiles(tags)
114+
print(f"Updated {count} Dockerfiles")
115+
116+
117+
if __name__ == "__main__":
118+
main()

0 commit comments

Comments
 (0)