spec: Decouple eval framework as standalone testing tool by tikazyq · Pull Request #24 · codervisor/synodic

tikazyq · 2026-03-18T23:16:05Z

Summary

This spec proposes decoupling the eval framework from synodic's governance harness, enabling it to function as an independent, zero-dependency testing framework. Currently, eval is tightly coupled to governance concerns (writing to .harness/eval.governance.jsonl, reading SYNODIC_ROOT), preventing external use and independent versioning.

Design Changes

Separation of Concerns:

Eval produces only structured JSON output (verdict + score reports) and exit codes
Synodic's harness becomes the consumer: it invokes eval, reads output, and writes governance logs
All governance-specific code removed from eval codebase

Architecture:

Restructure as a Cargo workspace with two member crates:
- synodic-eval: Standalone eval framework (no synodic dependencies)
- synodic: Governance harness that consumes eval output
New synodic/src/governance.rs handles reading eval JSON and writing governance JSONL

Code Removals from Eval:

append_governance_log() function (eval/run.rs:486-534)
extract_findings() helper (governance categorization moved to harness)
All .harness/ directory creation and references
SYNODIC_ROOT environment variable reads
Harness-specific comments and cross-run learning references

Project Root Discovery:

Split find_repo_root(): eval gets find_project_root() (looks for evals/ or .git), harness keeps original (looks for .harness/)
Replace SYNODIC_ROOT with EVAL_ROOT env var for eval-specific configuration

Key Benefits

Eval usable as standalone testing framework without governance infrastructure
Independent versioning and release cycles
External teams can adopt eval without synodic governance
Cleaner separation enables easier maintenance and testing
Governance log schema remains unchanged (backward compatible)

Testing Strategy

All 29 existing eval tests pass in standalone crate
Standalone binary builds with zero synodic dependencies
Eval works in directories without .harness/
Synodic harness integration still functional
No harness/governance references leak into eval codebase

https://claude.ai/code/session_01NhfevEyKE5jXFdwWtSqVU2

Adds spec 047 with architecture for extracting eval into a standalone crate (synodic-eval) within a Cargo workspace, with EvalReporter trait to replace hardcoded governance log coupling. https://claude.ai/code/session_01NhfevEyKE5jXFdwWtSqVU2

Drop EvalReporter trait design in favor of complete separation — eval produces JSON output only, harness reads it and writes its own governance logs. No governance concepts leak into eval at all. https://claude.ai/code/session_01NhfevEyKE5jXFdwWtSqVU2

Copilot

Pull request overview

Adds a new project spec describing how to decouple the existing eval framework from Synodic’s governance harness so eval can run as a standalone tool that emits structured results (JSON + exit codes), with the harness consuming those results and producing governance logs.

Changes:

Introduces spec 047 outlining separation of concerns between eval execution and governance logging.
Proposes a Cargo workspace split into synodic (harness) and synodic-eval (standalone eval).
Defines (at a high level) an eval output contract and updated project-root discovery approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

specs/047-decouple-eval-framework/README.md

+### Eval output contract
+
+Eval communicates results through two channels only:
+
+**1. Exit code** — `0` = resolved, `1` = not resolved, `2` = error
+**2. Structured JSON output** — written to `--output <path>` or stdout:
+
+```json
+{
+  "instance_id": "django__django-16379",
+  "benchmark": "swebench",
+  "skill": "factory",
+  "resolved": true,
+  "duration_s": 142,
+  "f2p": { "group": "FAIL_TO_PASS", "expected": 3, "passed": 3 },
+  "p2p": { "group": "PASS_TO_PASS", "expected": 47, "passed": 47 },
+  "score_report": "path/to/score_report.json"
+}
+```


specs/047-decouple-eval-framework/README.md

+
+### Project root discovery
+
+Eval needs to find the project root (for `evals/evals.json`). Currently it piggybacks on `find_repo_root()` which looks for `.harness/`. After decoupling:


specs/047-decouple-eval-framework/README.md

+The eval framework (setup → agent → score pipeline) is a general-purpose AI coding evaluation tool that should work independently of synodic's governance harness. Today eval directly writes to `.harness/eval.governance.jsonl` and reads `SYNODIC_ROOT` — it has no business knowing about governance at all.
+
+**Why now:** Eval is mature enough (29 tests, 3 benchmarks, batch mode) to stand alone. Complete separation enables:
+- Eval as a zero-dependency testing framework — no governance concepts leak in


tikazyq · 2026-03-18T23:19:55Z

@copilot implement

Copilot · 2026-03-18T23:20:02Z

@tikazyq I've opened a new pull request, #25, to work on those changes. Once the pull request is ready, I'll request review from you.

Split cli/ into a Cargo workspace with two crates: - synodic-eval: standalone eval framework (setup → agent → score) - Zero governance dependencies — no .harness/, no SYNODIC_ROOT - run::execute() returns EvalResult instead of writing gov logs - Uses EVAL_ROOT env var and find_project_root() (evals/ or .git) - Produces standalone binary: synodic-eval run|score|list|batch|report - All 35 tests pass independently - synodic: governance CLI depending on synodic-eval as library - New governance.rs: reads EvalResult, writes .harness/eval.governance.jsonl - extract_findings() and append_governance_log() moved here from eval - harness/run.rs sets EVAL_ROOT alongside SYNODIC_ROOT for agent subprocesses - synodic eval run still works (dispatches to synodic_eval, writes gov log) Implements spec 047-decouple-eval-framework. https://claude.ai/code/session_01NhfevEyKE5jXFdwWtSqVU2

tikazyq · 2026-03-19T00:20:21Z

@copilot resolve the merge conflicts

Copilot · 2026-03-19T00:20:30Z

@tikazyq I've opened a new pull request, #27, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: tikazyq <3393101+tikazyq@users.noreply.github.com>

claude added 2 commits March 18, 2026 22:30

Copilot AI review requested due to automatic review settings March 18, 2026 23:16

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Copilot AI mentioned this pull request Mar 18, 2026

spec(047): address review feedback on eval decoupling spec #25

Closed

Initial plan

329c775

Copilot AI mentioned this pull request Mar 19, 2026

Merge main into copilot/sub-pr-24: integrate fractal algorithmic spine into workspace #27

Merged

Copilot AI and others added 3 commits March 19, 2026 00:24

Merge origin/main: add fractal algorithmic spine to synodic workspace

3847f0d

Merge origin/main: add fractal algorithmic spine to synodic workspace

c3da743

Co-authored-by: tikazyq <3393101+tikazyq@users.noreply.github.com>

Merge pull request #27 from codervisor/copilot/sub-pr-24

3fa37fb

tikazyq merged commit bc2df9f into main Mar 19, 2026
2 checks passed

tikazyq deleted the claude/decouple-eval-framework-GIOVK branch March 19, 2026 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: Decouple eval framework as standalone testing tool#24

spec: Decouple eval framework as standalone testing tool#24
tikazyq merged 7 commits intomainfrom
claude/decouple-eval-framework-GIOVK

tikazyq commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

tikazyq commented Mar 18, 2026

Uh oh!

Copilot AI commented Mar 18, 2026

Uh oh!

tikazyq commented Mar 19, 2026

Uh oh!

Copilot AI commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		### Project root discovery

		Eval needs to find the project root (for `evals/evals.json`). Currently it piggybacks on `find_repo_root()` which looks for `.harness/`. After decoupling:

Conversation

tikazyq commented Mar 18, 2026

Summary

Design Changes

Key Benefits

Testing Strategy

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tikazyq commented Mar 18, 2026

Uh oh!

Copilot AI commented Mar 18, 2026

Uh oh!

tikazyq commented Mar 19, 2026

Uh oh!

Copilot AI commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants