docs(reports): phase 2 validation report + table-wrapping fixes + README/arch refresh#77
Merged
Merged
Conversation
…ortfolio note Phase 2 prose YAML mirrors the Phase 1 layout with nine top-level sections. Headline: skill-side deploy via the closed-loop-aware gate on weakened-systematic-debugging — synthetic delta tiny-and-negative (-0.004) but closed-loop pass-rate went 2/5 → 5/5, so the gate deploys via the closed-loop signal. Background, approach, and safety sections add Phase 2 deliverables (CL-aware gate, saturation pre-flight, improvement-or-equal acceptance, PR automation). Roadmap highlights Phase 2 as validated. Generator side: extend the context dict with cost_total_usd and a model-agnostic lm_calls_metrics field (sums calls from metrics.json::cost.by_model so runs that don't use the legacy gpt-4.1-mini / gpt-5-mini pair still get an accurate call count). Experiment-section configuration table now surfaces total cost, closed-loop validator model, and closed-loop suite size when present. Results table adds a Closed-loop tasks row when the v5 schema exposes behavioral pass-rate data; decision-note in the deploy banner picks up a "via closed-loop" variant when decision_signal == "closed_loop". Section title is now driven by prose YAML (section_title key) with a "Phase 1 Experiment" default for backwards compat.
Rendered from reports/phase2_prose.yaml against the headline run output/weakened-systematic-debugging/20260523_182457/ via generate_report.py. Nine pages, ~26 KB.
Plain `str` cells in a reportlab Table don't wrap — they overflow horizontally, painting over adjacent cells or running off the page. Long strings in the Configuration, Safety, Background, and Engines tables were producing visible defects in the Phase 2 PDF. Switch all table cells to `Paragraph` flowables (the canonical reportlab idiom): each cell wraps at its column width, and rows grow vertically to fit. Add two `TableCell` / `TableHeaderCell` ParagraphStyles, a small `_wrap_cell` helper, and thread `styles` through `_highlight_table` + `_key_result_box`. Drop the now-redundant FONTNAME/FONTSIZE/TEXTCOLOR directives from per-table TableStyles (the Paragraph style supplies them). Shave Configuration column 1 (short labels) and bump column 2 (the overflow column); same idea on the Engines table. Regenerate the Phase 2 PDF.
The prior commit's _experiment refactor accidentally applied TableHeaderCell (white text, intended for the dark header row) to the left-column labels of every body row. White-on-white = invisible Parameter labels in the rendered PDF. Drop the bold-label intent (Phase 1 used plain text and read fine); plain TableCell on both columns gives a consistent body-cell style and restores visible labels.
Insert <br/> at the em-dash so "Phase 2 Validation Report" and "Closed-loop-aware deploy gate + tool-side parity" render as separate lines on the title page. _footer strips the <br/> before joining so the footer continues to read as a single em-dash-separated row.
…2 validation reports - README "How It Works" mermaid now shows synthetic holdout + closed-loop suite both feeding a dual-signal deploy gate (CL-primary on synth-tie), matching today's gate behavior. - "Why this isn't just DSPy + GEPA" goes from two checks to three; new bullet describes closed-loop behavioral validation and links the Phase 2 validation report PDF. - Roadmap-table Status column links the Phase 1 and Phase 2 validation report PDFs (was "Implemented" text). - docs/architecture.md top-level flowchart adds a closed-loop branch feeding the deploy gate alongside the synthetic holdout, and renames the gate node accordingly. - docs/architecture.md single-run sequence diagram adds an optional ClosedLoopValidator step before validate_growth_with_quality and surfaces the decision_signal field returned by the gate.
…xt-step Drop the "Documentation polish — link the Phase 2 report from the README and refresh the architecture diagram" next-steps item; that work landed in the previous commit, so leaving it in the report would falsely list completed work as outstanding. Remaining four items stand on their own. Regenerate the PDF against the same headline run (output/weakened-systematic-debugging/20260523_182457/) so the rendered artifact matches the trimmed prose.
…ative Matches _background's spacer-before-closing-paragraph pattern; without it, the table and the prose hug too closely and read as one block.
Phase 1 PDF was last generated before this branch's table-wrapping + spacer fixes; regenerated against reconstructed run JSONs (synthesized from the values baked into the previously-committed PDF; output dir is gitignored so the synthetic JSONs stay local). New Phase 1 PDF picks up Paragraph-wrapped cells, the dual-line subtitle pattern, and the Engines-table column re-balancing. License column was tightened to 0.6" in the earlier wrapping fix and ended up wrapping the "License" header to "Licens / e". Bumping to 0.7" (trading 0.1" off What It Optimizes, which still has slack) lets the header sit on one line in both reports.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the Phase 2 validation report PDF —
reports/phase2_validation_report.pdf— alongside its prose source, generator fixes that make tables render cleanly for both Phase 1 and Phase 2, and README + architecture-diagram refreshes that reflect the dual-signal deploy gate.What lands
New artifact
reports/phase2_validation_report.pdf— 9-page validation report headlining the May 23 closed-loop-aware deploy gate. Concrete case: weakened-systematic-debugging skill with synthetic delta -0.004 (slightly negative) but closed-loop tasks gained 2/5 → 5/5 (+3). Decision: DEPLOYED via the closed-loop behavioral signal. This is the textbook case the Phase 2 gate was redesigned for.reports/phase2_prose.yaml— Phase 2 prose source. Same template-substitution shape asphase1_prose.yaml; nine sections covering executive summary, background, approach, experiment, results, safety, roadmap, and next steps.Generator improvements (
generate_report.py)decision_signal,cl_tasks_gained,cl_required_gain, etc.) so Phase 2 prose can reference them in template substitutions.Paragraphobjects so long strings auto-wrap to column width. Without this, rawstrcells overflow horizontally — running off the page or painting over adjacent cells. Affects every table in every report; both PDFs benefit.TableCellandTableHeaderCellParagraphStyles +_wrap_cellhelper._resultsto preserve the prior accent-on-DEPLOYED / bold-evolved-column visual fidelity.<br/>for title-page line breaks;_footerstrips it before joining so the footer stays one row.README + architecture refresh
docs/architecture.md: top-level flowchart and single-run sequence diagram updated to reflect the dual-signal gate.Phase 1 regenerated
reports/phase1_validation_report.pdfregenerated against reconstructed run JSONs (the originalgate_decision.json/metrics.json/run.logwere cleaned up long ago; reconstructed values match the previously-committed PDF). Picks up the table-wrapping fix + Engines column re-balance.Test plan
evolution/**/*.pychanges:git diff main..HEAD -- 'evolution/**/*.py'emptydocs/superpowers/still gitignored (no regression on the local-only-plans convention)