+{"actor":"sjarmak","comment":null,"created_at":"2026-03-09T14:29:12Z","event_type":"updated","id":24,"issue_id":"CodeScaleBench-2kz","new_value":"{\"notes\":\"2026-03-09 validation pass:\\\\n- Fixed stale task generators/templates so fresh org + SDLC scaffolded tasks now render and smoke clean without one-off harness patches.\\\\n- Temp scaffold validation: org template path renders, contract-check passes, and baseline/sg_only smoke runs produce reward artifacts as expected; feature/refactor scaffold outputs pass contract-only plus baseline/sg_only no-agent smoke.\\\\n- Curated local smoke subsets all passed via exact-selection flow: baseline (ccx-onboard-search-207, element-web-unread-indicators-diverge-fix-001, clickhouse-mergetree-arch-understand-001), sg_only (same trio), artifact_only (ccx-onboard-search-207, bustub-hyperloglog-impl-001, nodebb-plugin-validate-fix-001).\\\\n- Prepared rerun manifests: configs/claude_historical_failure_rerun_mcp_20260309.json and configs/openhands_historical_failure_rerun_baseline_20260309.json.\\\\n- Infra readiness checked: account_health.py status recommends proceed; check_infra.py now passes in current workspace.\\\\nRemaining: launch rerun manifests only after interactive confirmation, then classify any residual failures and decide permanent sentinel coverage.\\n2026-03-09 launch started after explicit confirmation.\\\\n- Claude MCP rerun batch launched via configs/run_selected_tasks.sh in Daytona mode using accounts account1/account2/account4 (account3 held, account5 reserved for OpenHands). Run dirs are rooted at runs/staging/csb_org_onboarding_sonnet_20260309_142738, runs/staging/csb_sdlc_feature_sonnet_20260309_142738, runs/staging/csb_sdlc_fix_sonnet_20260309_142738, runs/staging/csb_sdlc_secure_sonnet_20260309_142738, runs/staging/csb_sdlc_understand_sonnet_20260309_142738 under config mcp-remote-direct. Initial live tasks confirmed on disk for ccx-onboard-search-207/208/210.\\\\n- OpenHands baseline sentinel launched via configs/openhands_2config.sh in Daytona mode using account5 only. Run dir: runs/staging/openhands_sonnet46_20260309_142733/baseline-local-direct/.../ccx-onboard-search-212__CDJ962t.\\\\n- Remaining Claude tasks will submit as the 3-slot queue drains.\\\\nNext: monitor task completion/invalids, classify any residual failures, and decide which sentinels stay in permanent smoke coverage.\"}","old_value":"{\"id\":\"CodeScaleBench-2kz\",\"title\":\"Verify harness fixes by rerunning historical Claude/OpenHands failures\",\"description\":\"Run a focused verification batch to prove the current task-contract and harness hardening eliminates the earlier random patch churn.\\n\\nScope:\\n- Claude Code regression sentinels:\\n - mcp_ccx-onboard-search-207\\n - mcp_ccx-onboard-search-208\\n - mcp_ccx-onboard-search-210\\n - mcp_bustub-hyperloglog-impl-001\\n - mcp_django-sensitive-file-exclusion-001\\n - mcp_flink-window-late-data-fix-001\\n - mcp_element-web-unread-indicators-diverge-fix-001\\n - clickhouse-mergetree-arch-understand-001 (confirm Daytona/local routing now that storage metadata was corrected)\\n- OpenHands regression sentinel:\\n - ccx-onboard-search-212\\n\\nAcceptance criteria:\\n- Produce a small rerun manifest or manifests for the tasks above.\\n- Execute the reruns once accounts are ready.\\n- Confirm whether each task now completes as a valid run without ad hoc task-specific patches.\\n- Record any remaining failures as either harness bugs, task bugs, or infra issues with exact root cause.\\n- If clean, note which tasks should remain in the smoke/verification matrix as permanent regression sentinels.\\n\",\"notes\":\"2026-03-09 validation pass:\\\\n- Fixed stale task generators/templates so fresh org + SDLC scaffolded tasks now render and smoke clean without one-off harness patches.\\\\n- Temp scaffold validation: org template path renders, contract-check passes, and baseline/sg_only smoke runs produce reward artifacts as expected; feature/refactor scaffold outputs pass contract-only plus baseline/sg_only no-agent smoke.\\\\n- Curated local smoke subsets all passed via exact-selection flow: baseline (ccx-onboard-search-207, element-web-unread-indicators-diverge-fix-001, clickhouse-mergetree-arch-understand-001), sg_only (same trio), artifact_only (ccx-onboard-search-207, bustub-hyperloglog-impl-001, nodebb-plugin-validate-fix-001).\\\\n- Prepared rerun manifests: configs/claude_historical_failure_rerun_mcp_20260309.json and configs/openhands_historical_failure_rerun_baseline_20260309.json.\\\\n- Infra readiness checked: account_health.py status recommends proceed; check_infra.py now passes in current workspace.\\\\nRemaining: launch rerun manifests only after interactive confirmation, then classify any residual failures and decide permanent sentinel coverage.\",\"status\":\"in_progress\",\"priority\":1,\"issue_type\":\"task\",\"owner\":\"sjarmak@users.noreply.github.com\",\"created_at\":\"2026-03-09T13:11:58Z\",\"created_by\":\"sjarmak\",\"updated_at\":\"2026-03-09T14:10:34Z\"}"}
0 commit comments