You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
observatory: RFC + docs for Report (JSON) (json_report 3/3)
RFC_SESSIONS.md: new §4a 'Report (JSON)' documents --output-report-json,
the three-level lenses[lens][archive][session_id] nesting, the accuracy
worst_record semantics (argmin for quality metrics, argmax for error
metrics), worst_layers sort direction for per_layer_accuracy, and a
side-by-side table contrasting Archive / Report (HTML) / Report (JSON).
REFERENCE.md §1.3: adds a json_report row parallel to the existing
dashboard row, documenting the call structure, return contract, and
payload destination.
USAGE.md §4a: new 'Report (JSON) for CI and LLM triage' section with
a CLI example, a compact payload sketch, and a pointer to RFC §4a.
README.md: one-sentence mention with USAGE.md link added to step 4
of the workflow list.
accuracy.py: docstring corrected — worst_record for error metrics
(mse, abs_err) = argmax (highest value = worst quality), not argmin.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1.**Capture**: Observatory wraps your export script. Built-in lenses (e.g. `pipeline_graph_collector`) install monkey-patches that call `collect(...)` at each compilation stage; you can also call `Observatory.collect(...)` directly anywhere in your code.
38
38
2.**Store**: Records + per-Session metadata are persisted as a single Archive (JSON) for later re-analysis or comparison.
39
39
3.**Analyze**: Each lens processes the Archive into findings, comparisons, and derived insights.
40
-
4.**Visualize**: Results are assembled into an interactive HTML report (Report (HTML)) with multiple view types.
40
+
4.**Visualize**: Results are assembled into an interactive HTML report (Report (HTML)) with multiple view types. Use `--output-report-json` to also emit a Report (JSON) — a lens-summarised dict suitable for CI threshold checks, LLM-driven triage, and dashboard time-series ingestion (see [USAGE.md §4a](USAGE.md)).
41
41
5.**Share**: The Report is a single self-contained HTML file. Send it, attach it to a bug report, or host it on GitHub Pages.
|`dashboard`|`dashboard(session, session_records, analysis) -> Optional[ViewList]`| Framework calls once per `(Session, lens)` pair. `session` is the active `Session` (carries `id`, `name`, `archive`, `start_ts`, `end_ts`, per-lens `start_data` / `end_data`); `session_records` is `[r for r in records if r.session_id == session.id]`; `analysis` is the `AnalysisResult` for this lens. See [RFC_SESSIONS.md](RFC_SESSIONS.md) for the full contract. |`dashboard[lens][session_id]`|
46
+
|`json_report`|`json_report(session, session_records, analysis) -> Optional[Dict[str, Any]]`| Same call structure as `dashboard`. Called once per `(Session, lens)` pair by `export_report_json`. Result lands at `report["lenses"][lens_name][archive_label][session_id]`. Return `None` to opt out (no ghost keys). Returned dict must be JSON-serialisable. See `lenses/accuracy.py` and `lenses/per_layer_accuracy.py` for reference implementations. |`lenses[lens][archive][session_id]` in Report (JSON) |
Copy file name to clipboardExpand all lines: devtools/observatory/RFC_SESSIONS.md
+68Lines changed: 68 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,74 @@ Archive
96
96
97
97
There is no flat `start_data` / `end_data` in the archive either. `Observatory._load_archive_sessions` reads both the new shape and the legacy nested `session: {sessions, ...}` shape (forward-compat for previously-written archives) and synthesises an `archive` field for legacy entries.
98
98
99
+
## 4a. Report (JSON), `--output-report-json`
100
+
101
+
A third optional derived output alongside the HTML report. Intended for CI dashboards, LLM-driven regression triage, and automated comparison tooling. Produced by `Observatory.export_report_json`.
102
+
103
+
**Shape:**
104
+
105
+
```jsonc
106
+
{
107
+
"title":"...",
108
+
"generated_at":"...",
109
+
"archives": [ // same grouping as HTML report
110
+
{ "label":"default", "session_ids": ["default"] }
111
+
],
112
+
"sessions": [ // identity+timing only (no start_data/end_data)
**`worst_record` semantics for `accuracy`:** the record whose metric value was most unfavorable. For quality metrics (psnr, cosine_sim, top_k) this is the record with the *minimum* value; for error metrics (mse, abs_err) this is the record with the *maximum* value.
154
+
155
+
**`worst_layers` sort order for `per_layer_accuracy`:** psnr/cosine_sim sorted ascending (lower = worse); mse/abs_err sorted descending (higher = worse). Depth controlled by `config["per_layer_accuracy"]["json_report_top_n"]` (default 10).
156
+
157
+
**Lens hook:** lenses contribute by overriding `Frontend.json_report(session, session_records, analysis) -> Optional[Dict]`. Returning `None` opts out — no ghost keys appear. See `devtools/observatory/lenses/accuracy.py` and `devtools/observatory/lenses/per_layer_accuracy.py` for reference implementations.
158
+
159
+
**Distinguishing the three outputs:**
160
+
161
+
| Output | Flag | What it contains | For whom |
162
+
|---|---|---|---|
163
+
| Archive |`--output-archive`| Raw records + sessions, lossless | Re-visualization, compare |
164
+
| Report (HTML) |`--output-html`| Interactive HTML with graphs, lens panels | Human reviewers |
0 commit comments