feat(scripts): add cross-reference tag tooling#39662
Conversation
PR summary 61c1da021fImport changes for modified filesNo significant changes to the import graph Import changes for all files
|
Add two standalone scripts to support review of PRs that add `@[stacks]`, `@[kerodon]`, or `@[wikidata]` attributes: * `scripts/crossref-snippet.lean` fetches a one-line label/description for one or more tags from the upstream database. Wikidata uses the `wbgetentities` API (with recovery when one bad QID poisons a batch); Stacks and Kerodon use the documented Gerby `/data/tag/<TAG>/content/statement` endpoint with a tolerant HTML→text strip that survives `<` inside math. Exit codes distinguish all-resolved (0) from missing (2) from transient network failure (3). When `CROSSREF_CACHE_DIR` is set, responses are memoised per `(database, tag)` so repeat lookups are instant. * `scripts/extract-crossref-tags.lean` walks a `.lean` file (or the added lines of a git diff range) and emits TSV of every cross-reference attribute it finds, paired with the declaration it decorates. A byte-level scanner correctly skips string literals and line/block comments, handles multi-attribute blocks (`@[simp, stacks 01AB]`), multi-line attribute blocks, doc comments between attribute and declaration, and modifier keywords (`private`, `noncomputable`, …). Signatures are collapsed to one line for downstream consumption. Both scripts run via `lake env lean --run` and depend only on Lean core (plus `curl` on `PATH` for the fetcher), so no Mathlib build is required and a future CI workflow can drive them on untrusted PR files without elaborating PR code. Companion follow-ups (separate PRs) will add an LSP widget that surfaces these snippets in the editor and a GitHub Actions workflow that posts a once-per-PR comment with tag / signature / snippet tables and fails CI on unresolved tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4ba4247 to
61c1da0
Compare
When a Mathlib declaration carries `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]`, the info view now shows the upstream label / description when the cursor sits on the attribute. The fetch is on-demand via an RPC call from the widget to the Lean server, which shells out to `curl`; we deliberately don't fetch at attribute-elaboration time so offline builds stay clean and the only network access is when an editing session opens the panel. Two new modules and a small refactor to the existing one: * `Mathlib/Tactic/CrossRef/Fetch.lean` (new) holds the `Database` enum and the snippet-fetch logic. The companion `scripts/crossref-snippet.lean` in the previous PR reimplements the same logic standalone for CI usage; the two are kept in sync deliberately so the CI script doesn't need a Mathlib build. * `Mathlib/Tactic/Widget/CrossRefHover.lean` (new) defines the widget itself: an `RpcEncodable` props record, a `RequestM` RPC method that calls `fetchSnippet`, and a `mk_rpc_widget%` Component that renders the result as Html. * `Mathlib/Tactic/CrossRefAttribute.lean` (modified) imports the two new modules, removes the now-duplicated `Database` / `databaseURL` / `databaseLabel` definitions, and calls `Widget.savePanelWidgetInfo` in each attribute's `add` handler. Tag storage, docstring rewriting, and the trace commands are untouched. If `curl` is missing or the upstream site is unreachable, the widget renders an inline error instead of blocking the LSP. The existing `MathlibTest/CrossRefAttribute.lean` suite continues to pass — none of the behaviour visible to it changes. Stacked on leanprover-community#39662 (scripts only). A third follow-up will add a GitHub Actions workflow that posts a once-per-PR comment with tag / signature / snippet tables. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or updates, or deletes) a single bot comment with one row per added tag, showing the upstream label / description fetched from the source database, the Mathlib declaration's first-line signature, and the author's optional attribute comment. CI fails when any tag cannot be resolved upstream and emits a warning (not a failure) when only transient network errors occur. Three new pieces: * `scripts/crossref-pr-comment.py` — pure-stdlib Python orchestrator that drives `extract-crossref-tags.lean` then `crossref-snippet.lean` per database, formats the Markdown comment body with a hidden HTML marker for in-place updates, and returns a 0/1/2/3 exit code so the workflow can distinguish "nothing to post" from "post but pass" from "post and fail" from "post but treat as a network blip." * `.github/workflows/crossref_review.yml` — `pull_request_target` workflow with `pull-requests: write` (and only that). Installs elan, runs the orchestrator on the PR's diff range, finds the existing bot comment via `gh api … --jq` matching the marker, and `POST`s / `PATCH`es / `DELETE`s through the GitHub API. A startup gate exits cleanly when the branch pre-dates the companion scripts (so the workflow doesn't fail on PRs to an older base). * `docs/workflows.md` — new entry in the workflow inventory, matching the existing row format. Stacked on leanprover-community#39664 which adds the LSP widget side, which itself depends on leanprover-community#39662 (the scripts). Security note: the workflow runs the PR's own version of the scripts in the privileged `pull_request_target` context. The exposed token is GITHUB_TOKEN with only `pull-requests: write`, and the scripts only read .lean files and shell out to `curl` against three fixed upstream APIs, so the blast radius of a malicious script edit is limited to posting / editing PR comments. If maintainers prefer the trusted-only execution pattern used by `PR_summary.yml`, the orchestrator can move to mathlib-ci in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR adds two standalone scripts in
scripts/to support review of PRs that add@[stacks],@[kerodon], or@[wikidata]attributes. Neither needs alakefile.leanentry — both run vialake env lean --runand depend only on Lean core (pluscurlonPATHfor the fetcher).scripts/crossref-snippet.leanfetches a one-line label/description for one or more tags from the upstream database. Wikidata uses thewbgetentitiesAPI (with recovery when one bad QID poisons a batch); Stacks and Kerodon use the documented Gerby/data/tag/<TAG>/content/statementendpoint with a tolerant HTML→text strip that survives<inside math. Exit codes distinguish all-resolved (0) from missing (2) from transient network failure (3). WhenCROSSREF_CACHE_DIRis set, responses are memoised per(database, tag)so repeat lookups are instant.scripts/extract-crossref-tags.leanwalks a.leanfile (or the added lines of a git diff range) and emits TSV of every cross-reference attribute it finds, paired with the declaration it decorates. A byte-level scanner correctly skips string literals and line/block comments, handles multi-attribute blocks (@[simp, stacks 01AB]), multi-line attribute blocks, doc comments between attribute and declaration, and modifier keywords (private,noncomputable, …). Signatures are collapsed to one line for downstream consumption.Sample usage:
The scripts are deliberately text-only — they do not elaborate the files they inspect. That lets the companion CI workflow (a follow-up PR) operate on PR files as data without executing PR code under a privileged GitHub token.
Two stacked follow-up PRs (not yet open):
Mathlib/Tactic/CrossRef/Fetch.leanmodule.🤖 Prepared with Claude Code