|
1 | 1 | {"acceptance_criteria":"","actor":"","agent_state":"","assignee":null,"await_id":"","await_type":"","close_reason":"Done","closed_at":"2026-03-09T13:05:07Z","closed_by_session":"","compacted_at":null,"compacted_at_commit":null,"compaction_level":0,"content_hash":"fd09967d3c4f6c86a02ac44968cff45a310e9ceb4d1626ef3f2592c790aaa6bf","created_at":"2026-03-09T12:51:57Z","created_by":"sjarmak","crystallizes":0,"defer_until":null,"description":"Investigate whether clickhouse-mergetree-arch-understand-001 truly needs \u003e10G storage and, if possible, replace the static exception with measured repo-size/routing metadata. Also expand the registry smoke matrix if future harness regressions show uncovered task families.","design":"","due_at":null,"ephemeral":0,"estimated_minutes":null,"event_kind":"","external_ref":null,"hook_bead":"","id":"CodeScaleBench-03c","is_template":0,"issue_type":"task","last_activity":null,"metadata":"{}","mol_type":"","notes":"","original_size":null,"owner":"sjarmak@users.noreply.github.com","payload":"","pinned":0,"priority":2,"quality_score":null,"rig":"","role_bead":"","role_type":"","sender":"","source_repo":"","source_system":"","spec_id":"","status":"closed","target":"","timeout_ns":0,"title":"Audit ClickHouse storage exception and expand smoke coverage","updated_at":"2026-03-09T13:05:07Z","waiters":"","wisp_type":"","work_type":""} |
2 | | -{"acceptance_criteria":"","actor":"","agent_state":"","assignee":null,"await_id":"","await_type":"","close_reason":"","closed_at":null,"closed_by_session":"","compacted_at":null,"compacted_at_commit":null,"compaction_level":0,"content_hash":"e71c16398c3f1178357f507e5f0abb54c84894426f6821e51b482bb84b4a1910","created_at":"2026-03-09T13:11:58Z","created_by":"sjarmak","crystallizes":0,"defer_until":null,"description":"Run a focused verification batch to prove the current task-contract and harness hardening eliminates the earlier random patch churn.\n\nScope:\n- Claude Code regression sentinels:\n - mcp_ccx-onboard-search-207\n - mcp_ccx-onboard-search-208\n - mcp_ccx-onboard-search-210\n - mcp_bustub-hyperloglog-impl-001\n - mcp_django-sensitive-file-exclusion-001\n - mcp_flink-window-late-data-fix-001\n - mcp_element-web-unread-indicators-diverge-fix-001\n - clickhouse-mergetree-arch-understand-001 (confirm Daytona/local routing now that storage metadata was corrected)\n- OpenHands regression sentinel:\n - ccx-onboard-search-212\n\nAcceptance criteria:\n- Produce a small rerun manifest or manifests for the tasks above.\n- Execute the reruns once accounts are ready.\n- Confirm whether each task now completes as a valid run without ad hoc task-specific patches.\n- Record any remaining failures as either harness bugs, task bugs, or infra issues with exact root cause.\n- If clean, note which tasks should remain in the smoke/verification matrix as permanent regression sentinels.\n","design":"","due_at":null,"ephemeral":0,"estimated_minutes":null,"event_kind":"","external_ref":null,"hook_bead":"","id":"CodeScaleBench-2kz","is_template":0,"issue_type":"task","last_activity":null,"metadata":"{}","mol_type":"","notes":"","original_size":null,"owner":"sjarmak@users.noreply.github.com","payload":"","pinned":0,"priority":1,"quality_score":null,"rig":"","role_bead":"","role_type":"","sender":"","source_repo":"","source_system":"","spec_id":"","status":"open","target":"","timeout_ns":0,"title":"Verify harness fixes by rerunning historical Claude/OpenHands failures","updated_at":"2026-03-09T13:11:58Z","waiters":"","wisp_type":"","work_type":""} |
| 2 | +{"acceptance_criteria":"","actor":"","agent_state":"","assignee":null,"await_id":"","await_type":"","close_reason":"","closed_at":null,"closed_by_session":"","compacted_at":null,"compacted_at_commit":null,"compaction_level":0,"content_hash":"e71c16398c3f1178357f507e5f0abb54c84894426f6821e51b482bb84b4a1910","created_at":"2026-03-09T13:11:58Z","created_by":"sjarmak","crystallizes":0,"defer_until":null,"description":"Run a focused verification batch to prove the current task-contract and harness hardening eliminates the earlier random patch churn.\n\nScope:\n- Claude Code regression sentinels:\n - mcp_ccx-onboard-search-207\n - mcp_ccx-onboard-search-208\n - mcp_ccx-onboard-search-210\n - mcp_bustub-hyperloglog-impl-001\n - mcp_django-sensitive-file-exclusion-001\n - mcp_flink-window-late-data-fix-001\n - mcp_element-web-unread-indicators-diverge-fix-001\n - clickhouse-mergetree-arch-understand-001 (confirm Daytona/local routing now that storage metadata was corrected)\n- OpenHands regression sentinel:\n - ccx-onboard-search-212\n\nAcceptance criteria:\n- Produce a small rerun manifest or manifests for the tasks above.\n- Execute the reruns once accounts are ready.\n- Confirm whether each task now completes as a valid run without ad hoc task-specific patches.\n- Record any remaining failures as either harness bugs, task bugs, or infra issues with exact root cause.\n- If clean, note which tasks should remain in the smoke/verification matrix as permanent regression sentinels.\n","design":"","due_at":null,"ephemeral":0,"estimated_minutes":null,"event_kind":"","external_ref":null,"hook_bead":"","id":"CodeScaleBench-2kz","is_template":0,"issue_type":"task","last_activity":null,"metadata":"{}","mol_type":"","notes":"2026-03-09 validation pass:\\n- Fixed stale task generators/templates so fresh org + SDLC scaffolded tasks now render and smoke clean without one-off harness patches.\\n- Temp scaffold validation: org template path renders, contract-check passes, and baseline/sg_only smoke runs produce reward artifacts as expected; feature/refactor scaffold outputs pass contract-only plus baseline/sg_only no-agent smoke.\\n- Curated local smoke subsets all passed via exact-selection flow: baseline (ccx-onboard-search-207, element-web-unread-indicators-diverge-fix-001, clickhouse-mergetree-arch-understand-001), sg_only (same trio), artifact_only (ccx-onboard-search-207, bustub-hyperloglog-impl-001, nodebb-plugin-validate-fix-001).\\n- Prepared rerun manifests: configs/claude_historical_failure_rerun_mcp_20260309.json and configs/openhands_historical_failure_rerun_baseline_20260309.json.\\n- Infra readiness checked: account_health.py status recommends proceed; check_infra.py now passes in current workspace.\\nRemaining: launch rerun manifests only after interactive confirmation, then classify any residual failures and decide permanent sentinel coverage.","original_size":null,"owner":"sjarmak@users.noreply.github.com","payload":"","pinned":0,"priority":1,"quality_score":null,"rig":"","role_bead":"","role_type":"","sender":"","source_repo":"","source_system":"","spec_id":"","status":"in_progress","target":"","timeout_ns":0,"title":"Verify harness fixes by rerunning historical Claude/OpenHands failures","updated_at":"2026-03-09T14:10:34Z","waiters":"","wisp_type":"","work_type":""} |
3 | 3 | {"acceptance_criteria":"","actor":"","agent_state":"","assignee":null,"await_id":"","await_type":"","close_reason":"Defined Org→SDLC mapping for all 11 org suites. Selected 67 promotion candidates: all multi-repo, 84% 2M+ LOC, balanced across 6 target SDLC suites. configs/org_promotion_manifest.json.","closed_at":"2026-03-07T23:01:01Z","closed_by_session":"","compacted_at":null,"compacted_at_commit":null,"compaction_level":0,"content_hash":"e57ed0ffb8999cc5708e3fbe9fa45f6a2e6461b45004c38ec33b54abfd14e753","created_at":"2026-03-07T22:56:46Z","created_by":"sjarmak","crystallizes":0,"defer_until":null,"description":"Analyze current SDLC coverage gaps: multi-repo (only 15/171), large codebases (only 2 tasks in 8M-40M, 0 in \u003e40M), and task-type balance. Select ~60-80 Org tasks for promotion that maximize: (1) multi-repo representation across all SDLC suites, (2) large codebase coverage (prioritize 2M+ LOC), (3) task-type balance across comprehension/implementation/quality. Produce a promotion manifest with target suite, verifier approach, and priority ranking.","design":"","due_at":null,"ephemeral":0,"estimated_minutes":null,"event_kind":"","external_ref":null,"hook_bead":"","id":"CodeScaleBench-5p1","is_template":0,"issue_type":"task","last_activity":null,"metadata":"{}","mol_type":"","notes":"","original_size":null,"owner":"sjarmak@users.noreply.github.com","payload":"","pinned":0,"priority":2,"quality_score":null,"rig":"","role_bead":"","role_type":"","sender":"","source_repo":"","source_system":"","spec_id":"","status":"closed","target":"","timeout_ns":0,"title":"Select Org→SDLC promotion candidates optimized for coverage gaps","updated_at":"2026-03-07T23:01:01Z","waiters":"","wisp_type":"","work_type":""} |
4 | 4 | {"acceptance_criteria":"","actor":"","agent_state":"","assignee":null,"await_id":"","await_type":"","close_reason":"Defined Org→SDLC mapping for all 11 org suites. Selected 67 promotion candidates: all multi-repo, 84% 2M+ LOC, balanced across 6 target SDLC suites. configs/org_promotion_manifest.json.","closed_at":"2026-03-07T23:01:01Z","closed_by_session":"","compacted_at":null,"compacted_at_commit":null,"compaction_level":0,"content_hash":"d581391bafd28d416539191f5b91d255b0832d75fccc535e206157b820ddbeec","created_at":"2026-03-07T22:56:46Z","created_by":"sjarmak","crystallizes":0,"defer_until":null,"description":"Select Org tasks that naturally map to SDLC phases and add deterministic verifiers. Priority: multi-repo tasks from large codebases that fill gaps in SDLC coverage. Natural mappings: incident→debug, security/compliance→secure, migration→refactor, onboarding/domain→understand, crossrepo/crossrepo_tracing→design. For each promoted task, identify the most straightforward deterministic verifier approach matching the target SDLC suite's pattern. Focus on tasks where the oracle_checks.py already does structured validation that can be made deterministic.","design":"","due_at":null,"ephemeral":0,"estimated_minutes":null,"event_kind":"","external_ref":null,"hook_bead":"","id":"CodeScaleBench-aav","is_template":0,"issue_type":"task","last_activity":null,"metadata":"{}","mol_type":"","notes":"","original_size":null,"owner":"sjarmak@users.noreply.github.com","payload":"","pinned":0,"priority":2,"quality_score":null,"rig":"","role_bead":"","role_type":"","sender":"","source_repo":"","source_system":"","spec_id":"","status":"closed","target":"","timeout_ns":0,"title":"Map and promote Org tasks to SDLC categories","updated_at":"2026-03-07T23:01:01Z","waiters":"","wisp_type":"","work_type":""} |
5 | 5 | {"acceptance_criteria":"","actor":"","agent_state":"","assignee":null,"await_id":"","await_type":"","close_reason":"Taxonomy defined: comprehension/implementation/quality. Mapped all 20 suites and 477 tasks. Manifest: 40/37/23% split. configs/task_type_taxonomy.json + task_type field on all tasks.","closed_at":"2026-03-07T22:59:37Z","closed_by_session":"","compacted_at":null,"compacted_at_commit":null,"compaction_level":0,"content_hash":"cac4323aa5802e3e8dca37694c0f3c50c9dacf7ab21a04cf5e65a0bd3b7712a2","created_at":"2026-03-07T22:56:46Z","created_by":"sjarmak","crystallizes":0,"defer_until":null,"description":"Formalize the three task-type buckets that cut across suites: Comprehension (understand, design, document, onboarding, domain), Implementation (feature, fix, refactor, migration), Quality (test, debug, secure, compliance, incident). Add task_type field to selected_benchmark_tasks.json. Map existing SUITE_TO_PROFILE curator profiles to these three buckets. This taxonomy enables power analysis and balanced selection across task types, not just suites.","design":"","due_at":null,"ephemeral":0,"estimated_minutes":null,"event_kind":"","external_ref":null,"hook_bead":"","id":"CodeScaleBench-abl","is_template":0,"issue_type":"task","last_activity":null,"metadata":"{}","mol_type":"","notes":"","original_size":null,"owner":"sjarmak@users.noreply.github.com","payload":"","pinned":0,"priority":2,"quality_score":null,"rig":"","role_bead":"","role_type":"","sender":"","source_repo":"","source_system":"","spec_id":"","status":"closed","target":"","timeout_ns":0,"title":"Define task-type taxonomy: comprehension / implementation / quality","updated_at":"2026-03-07T22:59:37Z","waiters":"","wisp_type":"","work_type":""} |
|
0 commit comments