Collect eval findings and pass repo root to agent subprocesses by tikazyq · Pull Request #21 · codervisor/synodic

tikazyq · 2026-03-18T15:58:35Z

Summary

This change enhances the harness run execution to collect evaluation findings from eval governance logs and ensures agent subprocesses have access to the correct repository root via an environment variable.

Key Changes

Eval findings collection: Added collect_eval_findings() function that reads the eval.governance.jsonl log file and extracts findings from entries written during the current harness run. These findings are now included in both the governance record and final output JSON.
Repository root propagation: Modified run_agent() and run_agent_with_stdin() to accept and pass the repo_root parameter as the SYNODIC_ROOT environment variable to agent subprocesses.
Environment variable support in repo root detection: Updated find_repo_root() in util.rs to respect the SYNODIC_ROOT environment variable, allowing eval subprocesses to write governance logs to the correct project rather than the testbed.

Implementation Details

The collect_eval_findings() function reads up to the last 10 lines of the eval governance log in reverse order and filters entries by timestamp to only include those written after the harness run started.
Each collected finding includes instance_id, benchmark, resolved, and findings fields extracted from the governance log entries.
The eval findings are logged and included in both the gov_record (governance output) and final JSON output for tracking and analysis.

https://claude.ai/code/session_0157iwKYYLnrPU4dyNAxHgQ3

Ran synodic eval run swe:django-10097 wrapped in harness governance. Layer 1 static gate passed. Scoring showed Python version incompatibility with old Django codebase (codeset keyword removed in newer Python). https://claude.ai/code/session_0157iwKYYLnrPU4dyNAxHgQ3

Two bugs fixed: 1. find_repo_root() now respects SYNODIC_ROOT env var, which the harness sets when spawning eval subprocesses. Previously, eval governance logs were written to the testbed's .harness/ (wrong git repo) because find_repo_root() resolved CWD to the testbed. 2. Harness governance log now collects eval findings from eval.governance.jsonl and includes them in the harness record and manifest. Previously the harness only recorded pass/fail status with no eval-level learnings. https://claude.ai/code/session_0157iwKYYLnrPU4dyNAxHgQ3

Copilot

Pull request overview

This PR enhances the harness run workflow by (1) propagating the canonical repository root to agent/eval subprocesses and (2) collecting eval findings from .harness/eval.governance.jsonl into the harness governance record and run manifest, improving traceability across runs.

Changes:

Update repo root detection to respect SYNODIC_ROOT so subprocesses can consistently target the correct .harness/ directory.
Pass SYNODIC_ROOT (repo root) into agent subprocess environments from harness run.
Add collect_eval_findings() and include its output in harness.governance.jsonl records and manifest.json.

Reviewed changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`cli/src/util.rs`	Prefer `SYNODIC_ROOT` when locating the repo root to avoid writing governance artifacts into testbeds.
`cli/src/harness/run.rs`	Propagate repo root to subprocesses and collect recent eval governance findings into harness outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

cli/src/harness/run.rs

+        // Only collect entries written after this harness run started
+        if let Some(ts) = record.get("timestamp").and_then(|v| v.as_str()) {
+            if let Ok(entry_time) = chrono::DateTime::parse_from_rfc3339(ts) {
+                if entry_time < *run_start {


cli/src/harness/run.rs

+        let entry = json!({
+            "instance_id": record.get("instance_id"),
+            "benchmark": record.get("benchmark"),
+            "resolved": record.get("resolved"),
+            "findings": record.get("findings").unwrap_or(&json!([])),


Previous approach duplicated eval findings into the harness governance log by reading back eval.governance.jsonl after the subprocess wrote it. This was fragile (timestamp matching) and semantically wrong (harness status was "passed" even when eval scored resolved=false). Now: - eval exits non-zero when resolved=false (exit 1) - harness checks agent exit code as a final gate: if governance layers pass but agent reported failure, status becomes "error" - findings stay in eval.governance.jsonl only (single source of truth) - removed collect_eval_findings and eval_findings duplication https://claude.ai/code/session_0157iwKYYLnrPU4dyNAxHgQ3

All entries were from iterative testing of the harness+eval integration. No production data. https://claude.ai/code/session_0157iwKYYLnrPU4dyNAxHgQ3

claude added 2 commits March 18, 2026 15:36

Copilot AI review requested due to automatic review settings March 18, 2026 15:58

Copilot started reviewing on behalf of tikazyq March 18, 2026 15:59 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

claude added 2 commits March 18, 2026 16:10

Clear debug governance logs and run artifacts

aea9d9d

All entries were from iterative testing of the harness+eval integration. No production data. https://claude.ai/code/session_0157iwKYYLnrPU4dyNAxHgQ3

tikazyq merged commit 47365ce into main Mar 18, 2026
2 checks passed

tikazyq deleted the claude/run-swebench-harness-44AVk branch March 18, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect eval findings and pass repo root to agent subprocesses#21

Collect eval findings and pass repo root to agent subprocesses#21
tikazyq merged 4 commits intomainfrom
claude/run-swebench-harness-44AVk

tikazyq commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tikazyq commented Mar 18, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants