Skip to content

Commit fe4a526

Browse files
sjarmakclaude
andcommitted
fix: switch TAC-based sg_only Dockerfiles from ubuntu base to TAC base image
Both bustub-hyperloglog-impl-001 and openhands-search-file-test-001 used ubuntu:22.04 as their sg_only base image. The verifier for these tasks runs /utils/eval.py via the python_default binary, both of which are only present in the TAC base image — so the MCP verifier always crashed with "Evaluator failed" and scored 0. Root cause of MCP scoring 0.00 (while baseline scored >0): bustub: BL=0.17 -> MCP=0.00 (sde-implement-hyperloglog-image missing) openhands: BL=0.40 -> MCP=0.00 (sde-write-a-unit-test-*-image missing) Fix: change FROM to the same TAC base image used in the baseline Dockerfile, then add sg_only-specific layers: - Truncate source files (agent must use Sourcegraph MCP) - Recommit truncated state (prevents git history leaking code) - Add clone manifest (clone-at-verify strategy for verifier) - Add sg_only mode markers (/tmp/.sg_only_mode, /tmp/.sg_only_workdir) - Keep TAC env vars (TAC_SERVER_HOSTNAME, DECRYPTION_KEY) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 30c785c commit fe4a526

File tree

2 files changed

+45
-40
lines changed

2 files changed

+45
-40
lines changed

benchmarks/ccb_feature/bustub-hyperloglog-impl-001/environment/Dockerfile.sg_only

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,35 @@
1-
# bustub-hyperloglog-impl-001 — sg_only_env variant
2-
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
1+
# bustub-hyperloglog-impl-001 — sg_only_env variant (TAC base + v2 clone-at-verify)
2+
# Source files truncated — agent uses Sourcegraph MCP for code access.
3+
# Verifier clones mirror at verification time to restore source.
34

4-
FROM ubuntu:22.04
5+
FROM ghcr.io/theagentcompany/sde-implement-hyperloglog-image:1.0.0
56

67
ENV SOURCEGRAPH_REPO_NAME=sg-evals/bustub--d5f79431
78

8-
ENV DEBIAN_FRONTEND=noninteractive
9+
# TAC environment variables (required by /utils/init.sh and /utils/eval.py)
10+
ENV TAC_SERVER_HOSTNAME=localhost
11+
ENV DECRYPTION_KEY="theagentcompany is all you need"
912

10-
RUN apt-get update && apt-get install -y --no-install-recommends \
11-
git \
12-
ca-certificates \
13-
python3 \
14-
curl \
15-
&& rm -rf /var/lib/apt/lists/*
13+
RUN mkdir -p /logs
1614

1715
WORKDIR /workspace
1816

19-
# Empty git repo so agent can commit work
20-
RUN git init && \
21-
git config user.email "agent@example.com" && \
22-
git config user.name "Agent"
23-
24-
RUN mkdir -p /logs/agent /logs/verifier
25-
26-
# Mark sg_only mode so verifiers can skip local-path checks
27-
RUN touch /tmp/.sg_only_mode
28-
29-
# Clone manifest for verifier repo restoration
17+
# Truncate C/C++ source files so agent cannot read them locally.
18+
# TAC config files and build artifacts are left intact.
19+
RUN find /workspace -type f \( -name "*.cpp" -o -name "*.cc" -o -name "*.cxx" -o -name "*.c" \
20+
-o -name "*.h" -o -name "*.hh" -o -name "*.hpp" -o -name "*.hxx" \) \
21+
! -path "*/.git/*" -exec truncate -s 0 {} \; || true
22+
# Recommit truncated state so git history cannot recover full files.
23+
RUN cd /workspace && git config user.email "agent@example.com" && \
24+
git config user.name "Agent" && \
25+
git add -A && git commit -m "sg_only truncation" --allow-empty --quiet || true
26+
27+
# Clone manifest for verifier (clone-at-verify strategy)
3028
RUN echo '{"workdir":"/workspace","repos":[{"mirror":"sg-evals/bustub--d5f79431","target_dir":"."}]}' > /tmp/.sg_only_clone_manifest.json
3129

30+
# Mark sg_only mode
31+
RUN touch /tmp/.sg_only_mode && echo '/workspace' > /tmp/.sg_only_workdir
32+
3233
# Pre-create claude user and set ownership at build time.
3334
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \
3435
for d in /workspace /app /testbed /logs; do [ -d "$d" ] && chown -R claude:claude "$d"; done || true

benchmarks/ccb_test/openhands-search-file-test-001/environment/Dockerfile.sg_only

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,34 @@
1-
# openhands-search-file-test-001 — sg_only_env variant
2-
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
1+
# openhands-search-file-test-001 — sg_only_env variant (TAC base + v2 clone-at-verify)
2+
# Source files truncated — agent uses Sourcegraph MCP for code access.
3+
# Verifier clones mirror at verification time to restore source.
34

4-
FROM ubuntu:22.04
5+
FROM ghcr.io/theagentcompany/sde-write-a-unit-test-for-search_file-function-image:1.0.0
56

67
ENV SOURCEGRAPH_REPO_NAME=sg-evals/OpenHands--latest
78

8-
ENV DEBIAN_FRONTEND=noninteractive
9+
# TAC environment variables (required by /utils/init.sh and /utils/eval.py)
10+
ENV TAC_SERVER_HOSTNAME=localhost
11+
ENV DECRYPTION_KEY="theagentcompany is all you need"
912

10-
RUN apt-get update && apt-get install -y --no-install-recommends \
11-
git \
12-
ca-certificates \
13-
python3 \
14-
curl \
15-
&& rm -rf /var/lib/apt/lists/*
13+
RUN mkdir -p /logs
1614

1715
WORKDIR /workspace
1816

19-
# Empty git repo so agent can commit work
20-
RUN git init && \
21-
git config user.email "agent@example.com" && \
22-
git config user.name "Agent"
23-
24-
RUN mkdir -p /logs/agent /logs/verifier
25-
26-
# Mark sg_only mode so verifiers can skip local-path checks
27-
RUN touch /tmp/.sg_only_mode
17+
# Truncate OpenHands Python source files so agent cannot read them locally.
18+
# TAC config files, utils, and verifier infrastructure are left intact.
19+
RUN find /workspace/openhands -type f -name "*.py" ! -path "*/.git/*" \
20+
-exec truncate -s 0 {} \; 2>/dev/null || true
21+
# Recommit truncated state so git history cannot recover full files.
22+
RUN cd /workspace/openhands && git config user.email "agent@example.com" && \
23+
git config user.name "Agent" && \
24+
git add -A && git commit -m "sg_only truncation" --allow-empty --quiet 2>/dev/null || true
25+
26+
# Clone manifest for verifier (clone-at-verify strategy)
27+
# OpenHands repo lives at /workspace/openhands in the TAC image
28+
RUN echo '{"workdir":"/workspace","repos":[{"mirror":"sg-evals/OpenHands--latest","target_dir":"openhands"}]}' > /tmp/.sg_only_clone_manifest.json
29+
30+
# Mark sg_only mode
31+
RUN touch /tmp/.sg_only_mode && echo '/workspace' > /tmp/.sg_only_workdir
2832

2933
# Pre-create claude user and set ownership at build time.
3034
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \

0 commit comments

Comments
 (0)