feat(CrossRefAttribute): info-view widget for cross-reference tags#39664
feat(CrossRefAttribute): info-view widget for cross-reference tags#39664kim-em wants to merge 2 commits into
Conversation
PR summary 63fd5cc7edImport changes for modified filesNo significant changes to the import graph Import changes for all files
|
|
This PR/issue depends on:
|
Add two standalone scripts to support review of PRs that add `@[stacks]`, `@[kerodon]`, or `@[wikidata]` attributes: * `scripts/crossref-snippet.lean` fetches a one-line label/description for one or more tags from the upstream database. Wikidata uses the `wbgetentities` API (with recovery when one bad QID poisons a batch); Stacks and Kerodon use the documented Gerby `/data/tag/<TAG>/content/statement` endpoint with a tolerant HTML→text strip that survives `<` inside math. Exit codes distinguish all-resolved (0) from missing (2) from transient network failure (3). When `CROSSREF_CACHE_DIR` is set, responses are memoised per `(database, tag)` so repeat lookups are instant. * `scripts/extract-crossref-tags.lean` walks a `.lean` file (or the added lines of a git diff range) and emits TSV of every cross-reference attribute it finds, paired with the declaration it decorates. A byte-level scanner correctly skips string literals and line/block comments, handles multi-attribute blocks (`@[simp, stacks 01AB]`), multi-line attribute blocks, doc comments between attribute and declaration, and modifier keywords (`private`, `noncomputable`, …). Signatures are collapsed to one line for downstream consumption. Both scripts run via `lake env lean --run` and depend only on Lean core (plus `curl` on `PATH` for the fetcher), so no Mathlib build is required and a future CI workflow can drive them on untrusted PR files without elaborating PR code. Companion follow-ups (separate PRs) will add an LSP widget that surfaces these snippets in the editor and a GitHub Actions workflow that posts a once-per-PR comment with tag / signature / snippet tables and fails CI on unresolved tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a Mathlib declaration carries `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]`, the info view now shows the upstream label / description when the cursor sits on the attribute. The fetch is on-demand via an RPC call from the widget to the Lean server, which shells out to `curl`; we deliberately don't fetch at attribute-elaboration time so offline builds stay clean and the only network access is when an editing session opens the panel. Two new modules and a small refactor to the existing one: * `Mathlib/Tactic/CrossRef/Fetch.lean` (new) holds the `Database` enum and the snippet-fetch logic. The companion `scripts/crossref-snippet.lean` in the previous PR reimplements the same logic standalone for CI usage; the two are kept in sync deliberately so the CI script doesn't need a Mathlib build. * `Mathlib/Tactic/Widget/CrossRefHover.lean` (new) defines the widget itself: an `RpcEncodable` props record, a `RequestM` RPC method that calls `fetchSnippet`, and a `mk_rpc_widget%` Component that renders the result as Html. * `Mathlib/Tactic/CrossRefAttribute.lean` (modified) imports the two new modules, removes the now-duplicated `Database` / `databaseURL` / `databaseLabel` definitions, and calls `Widget.savePanelWidgetInfo` in each attribute's `add` handler. Tag storage, docstring rewriting, and the trace commands are untouched. If `curl` is missing or the upstream site is unreachable, the widget renders an inline error instead of blocking the LSP. The existing `MathlibTest/CrossRefAttribute.lean` suite continues to pass — none of the behaviour visible to it changes. Stacked on leanprover-community#39662 (scripts only). A third follow-up will add a GitHub Actions workflow that posts a once-per-PR comment with tag / signature / snippet tables. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d0f86f7 to
63fd5cc
Compare
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or updates, or deletes) a single bot comment with one row per added tag, showing the upstream label / description fetched from the source database, the Mathlib declaration's first-line signature, and the author's optional attribute comment. CI fails when any tag cannot be resolved upstream and emits a warning (not a failure) when only transient network errors occur. Three new pieces: * `scripts/crossref-pr-comment.py` — pure-stdlib Python orchestrator that drives `extract-crossref-tags.lean` then `crossref-snippet.lean` per database, formats the Markdown comment body with a hidden HTML marker for in-place updates, and returns a 0/1/2/3 exit code so the workflow can distinguish "nothing to post" from "post but pass" from "post and fail" from "post but treat as a network blip." * `.github/workflows/crossref_review.yml` — `pull_request_target` workflow with `pull-requests: write` (and only that). Installs elan, runs the orchestrator on the PR's diff range, finds the existing bot comment via `gh api … --jq` matching the marker, and `POST`s / `PATCH`es / `DELETE`s through the GitHub API. A startup gate exits cleanly when the branch pre-dates the companion scripts (so the workflow doesn't fail on PRs to an older base). * `docs/workflows.md` — new entry in the workflow inventory, matching the existing row format. Stacked on leanprover-community#39664 which adds the LSP widget side, which itself depends on leanprover-community#39662 (the scripts). Security note: the workflow runs the PR's own version of the scripts in the privileged `pull_request_target` context. The exposed token is GITHUB_TOKEN with only `pull-requests: write`, and the scripts only read .lean files and shell out to `curl` against three fixed upstream APIs, so the blast radius of a malicious script edit is limited to posting / editing PR comments. If maintainers prefer the trusted-only execution pattern used by `PR_summary.yml`, the orchestrator can move to mathlib-ci in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| private def fetchUrl (url : String) : IO (Nat × String) := do | ||
| let output ← IO.Process.output { | ||
| cmd := "curl" | ||
| args := #["-sSL", "--max-time", "10", "-A", userAgent, | ||
| "-w", "\n%{http_code}", url] | ||
| } | ||
| if output.exitCode != 0 then return (0, "") |
There was a problem hiding this comment.
Can we use Std.Http here instead? Or do the query javacsript-side?
| private def fetchUrl (url : String) : IO (Nat × String) := do | ||
| let output ← IO.Process.output { | ||
| cmd := "curl" | ||
| args := #["-sSL", "--max-time", "10", "-A", userAgent, | ||
| "-w", "\n%{http_code}", url] | ||
| } | ||
| if output.exitCode != 0 then return (0, "") | ||
| let parts := output.stdout.splitOn "\n" | ||
| match parts.reverse with | ||
| | last :: rest => | ||
| let body := "\n".intercalate rest.reverse | ||
| let status := last.trimAscii.toString.toNat?.getD 0 | ||
| return (status, body) | ||
| | [] => return (0, "") |
There was a problem hiding this comment.
Splitting the whole string then rejoining seems pretty silly here;
| /-- The outcome of trying to fetch a snippet. -/ | ||
| public inductive SnippetOutcome where | ||
| /-- Upstream returned a (title, description). Either may be empty. -/ | ||
| | ok (title : String) (description : String) | ||
| /-- The tag was authoritatively missing upstream. -/ | ||
| | missing | ||
| /-- A transient problem (network, parse, …). -/ | ||
| | network (reason : String) | ||
| deriving Repr |
There was a problem hiding this comment.
I think clearer as
structure Snippet where
title : String
description : String
inductive SnippetError where
| network (reason : String)
abbrev SnippetOutcome := Except SnippetError (Option Snippet)which lets you use Except machinery.
| /-- Best-effort HTML→text. We treat `<x…>` as a tag only when `x` is a letter, | ||
| `/`, or `!`, so a literal `<` inside LaTeX (`0 < 1`) is preserved. -/ | ||
| private def stripHtml (html : String) : String := | ||
| let chars := html.toList | ||
| let rec go : List Char → Bool → String → String | ||
| | [], _, acc => acc | ||
| | '<' :: rest, false, acc => | ||
| match rest with | ||
| | c :: _ => | ||
| if c.isAlpha || c == '/' || c == '!' then go rest true acc | ||
| else go rest false (acc.push '<') | ||
| | [] => acc.push '<' | ||
| | '>' :: rest, true, acc => go rest false acc | ||
| | _ :: rest, true, acc => go rest true acc | ||
| | c :: rest, false, acc => go rest false (acc.push c) | ||
| let raw := go chars false "" | ||
| let decoded := raw | ||
| |>.replace " " " " | ||
| |>.replace "&" "&" | ||
| |>.replace "<" "<" | ||
| |>.replace ">" ">" | ||
| |>.replace """ "\"" | ||
| |>.replace "'" "'" | ||
| flattenWhitespace decoded |
There was a problem hiding this comment.
Why are we doing this when the infoview can display HTML?
This PR teaches the info view to show the upstream label and description when the cursor sits on a
@[stacks ...],@[kerodon ...], or@[wikidata ...]attribute. The fetch is on-demand via an RPC call from the widget to the Lean server, which shells out tocurl; we deliberately do not fetch at attribute-elaboration time so offline builds stay clean and the only network access is when an editing session opens the panel.Two new modules and a small refactor to the existing one:
Mathlib/Tactic/CrossRef/Fetch.lean(new) holds theDatabaseenum and the snippet-fetch logic. The companionscripts/crossref-snippet.leanin the stacked PR reimplements the same logic standalone for CI usage; the two are kept in sync deliberately so the CI script doesn't need a Mathlib build.Mathlib/Tactic/Widget/CrossRefHover.lean(new) defines the widget itself: anRpcEncodableprops record, aRequestMRPC method that callsfetchSnippet, and amk_rpc_widget%Component that renders the result asHtml.Mathlib/Tactic/CrossRefAttribute.lean(modified) imports the two new modules, removes the now-duplicatedDatabase/databaseURL/databaseLabeldefinitions, and callsWidget.savePanelWidgetInfoin each attribute'saddhandler. Tag storage, docstring rewriting, and the trace commands are untouched, and the existingMathlibTest/CrossRefAttribute.leansuite continues to pass — none of the behaviour visible to it changes.If
curlis missing or the upstream site is unreachable, the widget renders an inline error instead of blocking the LSP.🤖 Prepared with Claude Code