{"actor":"sjarmak","comment":null,"created_at":"2026-03-10T12:14:18Z","event_type":"updated","id":61,"issue_id":"CodeScaleBench-yb4","new_value":"{\"notes\":\"## Investigation Results (2026-03-10)\\n\\n### Issue 1: Harbor FileNotFoundError (django-select-for-update)\\n**Root cause**: NOT a Harbor/Daytona sandbox race. The actual error is `FileNotFoundError: [Errno 2] No such file or directory: 'pkill'` in `/tmp/oh_launcher.py` line 262. Some container images don't have `pkill` installed.\\n**Fix**: Added `shutil.which('pkill')` guard in `agent.py` — falls back to `os.system('kill $(ps aux | ...)')` when pkill is unavailable.\\n\\n### Issue 2: Jupyter fget crash (AttributeError: 'list' object has no attribute 'fget')\\n**Status**: Already fixed in d0fab95. Current code on main correctly uses list comprehension to filter sandbox_plugins.\\n\\n### Issue 3: Bustub-hyperloglog MCP 6.5hr timeout\\n**Root cause**: AgentTimeoutError after hitting 24000s max. Haiku sentinel run with only 7.86% MCP usage. Task-level/model issue, not infra bug. No code fix needed.\\n\\n### Issue 4: Bustub-hyperloglog DinD build failure\\n**Status**: Haiku sentinel run — DinD build never completed. Likely transient. Will be retried in rerun.\\n\\n### no_changes_guard audit\\n**Result**: No `no_changes_guard` references found in any official run result files. No false-positive contamination.\\n\\n### OH launcher org task support\\n**Verified**: `openhands_2config.sh` reads task_dir/benchmark from JSON directly. No filtering that skips csb_org_* tasks. The 3 org tasks in oh_full_rerun_20260310.json will work.\\n\\n### Remaining\\n- The pkill fix needs commit+push\\n- Then rerun all 12 tasks via: `--subset oh_full_rerun_20260310.json`\\n- Tainted staging runs (openhands_sonnet46_20260309_{210054,223658,232133,232947,233609}) must NOT be promoted\"}","old_value":"{\"id\":\"CodeScaleBench-yb4\",\"title\":\"Investigate OH/Harbor infrastructure failures before rerun\",\"description\":\"Three distinct infra failures need fixing before rerunning OH verification tasks:\\n\\n1. Harbor FileNotFoundError: django-select-for-update agent ran successfully (614 lines output, 0 crashes) but Harbor crashed writing command-2/return-code.txt. Likely Daytona sandbox cleanup race in ccb_harbor.daytona:GuardedDaytonaEnvironment.\\n\\n2. DinD build failure: bustub-hyperloglog baseline (Claude Haiku sentinel, csb_sdlc_feature_haiku_20260309_223654) — DinD build never completed, no task-level result dir created.\\n\\n3. MCP 6.5hr exception: bustub-hyperloglog MCP (same sentinel run) — ran 6.5 hours then exception_raised. flagged.json shows deepsearch_unused + only 7.86% MCP ratio.\\n\\nAfter fixing these, rerun all 12 tasks using configs/oh_full_rerun_20260310.json. The 9 original verification subset tasks crashed due to jupyter/fget bugs (now fixed in d0fab95). The 3 extra tasks (compliance-124, agentic-122, django-select-for-update) also need rerun. Note: 3 tasks are csb_org_* — verify OH launcher handles org tasks (prior rerun silently skipped them).\\n\\nAlso audit official runs for false positives from the no_changes_guard verifier bug (fixed in c5f261f):\\n grep -rl no_changes_guard runs/official/*/validation_result.json\\n\\nTainted runs (do NOT promote): openhands_sonnet46_20260309_{210054,223658,232947}\",\"status\":\"in_progress\",\"priority\":2,\"issue_type\":\"bug\",\"owner\":\"sjarmak@users.noreply.github.com\",\"created_at\":\"2026-03-10T11:27:18Z\",\"created_by\":\"sjarmak\",\"updated_at\":\"2026-03-10T12:13:47Z\"}"}
0 commit comments