Skip to content

docs(reports): phase 2 validation report + table-wrapping fixes + README/arch refresh#77

Merged
jramos merged 11 commits into
mainfrom
docs/phase-2-validation-report
May 26, 2026
Merged

docs(reports): phase 2 validation report + table-wrapping fixes + README/arch refresh#77
jramos merged 11 commits into
mainfrom
docs/phase-2-validation-report

Conversation

@jramos
Copy link
Copy Markdown
Owner

@jramos jramos commented May 26, 2026

Summary

Ships the Phase 2 validation report PDF — reports/phase2_validation_report.pdf — alongside its prose source, generator fixes that make tables render cleanly for both Phase 1 and Phase 2, and README + architecture-diagram refreshes that reflect the dual-signal deploy gate.

What lands

New artifact

  • reports/phase2_validation_report.pdf — 9-page validation report headlining the May 23 closed-loop-aware deploy gate. Concrete case: weakened-systematic-debugging skill with synthetic delta -0.004 (slightly negative) but closed-loop tasks gained 2/5 → 5/5 (+3). Decision: DEPLOYED via the closed-loop behavioral signal. This is the textbook case the Phase 2 gate was redesigned for.
  • reports/phase2_prose.yaml — Phase 2 prose source. Same template-substitution shape as phase1_prose.yaml; nine sections covering executive summary, background, approach, experiment, results, safety, roadmap, and next steps.

Generator improvements (generate_report.py)

  • Extract v5 schema CL-primary fields (decision_signal, cl_tasks_gained, cl_required_gain, etc.) so Phase 2 prose can reference them in template substitutions.
  • Wrap all table cells in Paragraph objects so long strings auto-wrap to column width. Without this, raw str cells overflow horizontally — running off the page or painting over adjacent cells. Affects every table in every report; both PDFs benefit.
  • New TableCell and TableHeaderCell ParagraphStyles + _wrap_cell helper.
  • Per-cell ParagraphStyles in _results to preserve the prior accent-on-DEPLOYED / bold-evolved-column visual fidelity.
  • Spacer between Engines table and GEPA narrative (was hugging too closely).
  • Engines table column re-balance so "License" header fits on one line.
  • Subtitle now supports <br/> for title-page line breaks; _footer strips it before joining so the footer stays one row.

README + architecture refresh

  • "How It Works" mermaid in README updated to show the dual-signal flow: both synthetic holdout and closed-loop behavioral suite feed the gate.
  • "Why this isn't just DSPy + GEPA" prose now lists three checks (added closed-loop behavioral validation as the third).
  • Roadmap table links both Phase 1 and Phase 2 validation PDFs.
  • docs/architecture.md: top-level flowchart and single-run sequence diagram updated to reflect the dual-signal gate.

Phase 1 regenerated

  • reports/phase1_validation_report.pdf regenerated against reconstructed run JSONs (the original gate_decision.json / metrics.json / run.log were cleaned up long ago; reconstructed values match the previously-committed PDF). Picks up the table-wrapping fix + Engines column re-balance.

Test plan

  • Full suite green locally (1166 passed; no production code changed)
  • No evolution/**/*.py changes: git diff main..HEAD -- 'evolution/**/*.py' empty
  • Both PDFs render cleanly: no horizontal overflow, no cells overlapping, headers fit
  • Configuration table's Parameter column visible (had been white-on-white during iteration)
  • docs/superpowers/ still gitignored (no regression on the local-only-plans convention)
  • CI green across 4 Python versions

jramos added 11 commits May 25, 2026 18:32
…ortfolio note

Phase 2 prose YAML mirrors the Phase 1 layout with nine top-level sections.
Headline: skill-side deploy via the closed-loop-aware gate on
weakened-systematic-debugging — synthetic delta tiny-and-negative
(-0.004) but closed-loop pass-rate went 2/5 → 5/5, so the gate deploys
via the closed-loop signal. Background, approach, and safety sections
add Phase 2 deliverables (CL-aware gate, saturation pre-flight,
improvement-or-equal acceptance, PR automation). Roadmap highlights
Phase 2 as validated.

Generator side: extend the context dict with cost_total_usd and a
model-agnostic lm_calls_metrics field (sums calls from
metrics.json::cost.by_model so runs that don't use the legacy
gpt-4.1-mini / gpt-5-mini pair still get an accurate call count).
Experiment-section configuration table now surfaces total cost,
closed-loop validator model, and closed-loop suite size when present.
Results table adds a Closed-loop tasks row when the v5 schema exposes
behavioral pass-rate data; decision-note in the deploy banner picks up
a "via closed-loop" variant when decision_signal == "closed_loop".
Section title is now driven by prose YAML (section_title key) with a
"Phase 1 Experiment" default for backwards compat.
Rendered from reports/phase2_prose.yaml against the headline run
output/weakened-systematic-debugging/20260523_182457/ via
generate_report.py. Nine pages, ~26 KB.
Plain `str` cells in a reportlab Table don't wrap — they overflow
horizontally, painting over adjacent cells or running off the page.
Long strings in the Configuration, Safety, Background, and Engines
tables were producing visible defects in the Phase 2 PDF.

Switch all table cells to `Paragraph` flowables (the canonical
reportlab idiom): each cell wraps at its column width, and rows
grow vertically to fit. Add two `TableCell` / `TableHeaderCell`
ParagraphStyles, a small `_wrap_cell` helper, and thread `styles`
through `_highlight_table` + `_key_result_box`. Drop the now-redundant
FONTNAME/FONTSIZE/TEXTCOLOR directives from per-table TableStyles
(the Paragraph style supplies them). Shave Configuration column 1
(short labels) and bump column 2 (the overflow column); same idea
on the Engines table. Regenerate the Phase 2 PDF.
The prior commit's _experiment refactor accidentally applied
TableHeaderCell (white text, intended for the dark header row) to the
left-column labels of every body row. White-on-white = invisible
Parameter labels in the rendered PDF.

Drop the bold-label intent (Phase 1 used plain text and read fine);
plain TableCell on both columns gives a consistent body-cell style
and restores visible labels.
Insert <br/> at the em-dash so "Phase 2 Validation Report" and
"Closed-loop-aware deploy gate + tool-side parity" render as separate
lines on the title page. _footer strips the <br/> before joining so
the footer continues to read as a single em-dash-separated row.
…2 validation reports

- README "How It Works" mermaid now shows synthetic holdout + closed-loop
  suite both feeding a dual-signal deploy gate (CL-primary on synth-tie),
  matching today's gate behavior.
- "Why this isn't just DSPy + GEPA" goes from two checks to three; new
  bullet describes closed-loop behavioral validation and links the
  Phase 2 validation report PDF.
- Roadmap-table Status column links the Phase 1 and Phase 2 validation
  report PDFs (was "Implemented" text).
- docs/architecture.md top-level flowchart adds a closed-loop branch
  feeding the deploy gate alongside the synthetic holdout, and renames
  the gate node accordingly.
- docs/architecture.md single-run sequence diagram adds an optional
  ClosedLoopValidator step before validate_growth_with_quality and
  surfaces the decision_signal field returned by the gate.
…xt-step

Drop the "Documentation polish — link the Phase 2 report from the README
and refresh the architecture diagram" next-steps item; that work landed
in the previous commit, so leaving it in the report would falsely list
completed work as outstanding. Remaining four items stand on their own.

Regenerate the PDF against the same headline run
(output/weakened-systematic-debugging/20260523_182457/) so the rendered
artifact matches the trimmed prose.
…ative

Matches _background's spacer-before-closing-paragraph pattern; without
it, the table and the prose hug too closely and read as one block.
Phase 1 PDF was last generated before this branch's table-wrapping +
spacer fixes; regenerated against reconstructed run JSONs (synthesized
from the values baked into the previously-committed PDF; output dir is
gitignored so the synthetic JSONs stay local). New Phase 1 PDF picks
up Paragraph-wrapped cells, the dual-line subtitle pattern, and the
Engines-table column re-balancing.

License column was tightened to 0.6" in the earlier wrapping fix and
ended up wrapping the "License" header to "Licens / e". Bumping to 0.7"
(trading 0.1" off What It Optimizes, which still has slack) lets the
header sit on one line in both reports.
@jramos jramos merged commit 401a07e into main May 26, 2026
4 checks passed
@jramos jramos deleted the docs/phase-2-validation-report branch May 26, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant