Skip to content

feat(CrossRefAttribute): info-view widget for cross-reference tags#39664

Draft
kim-em wants to merge 2 commits into
leanprover-community:masterfrom
kim-em:crossref-tooling-widget
Draft

feat(CrossRefAttribute): info-view widget for cross-reference tags#39664
kim-em wants to merge 2 commits into
leanprover-community:masterfrom
kim-em:crossref-tooling-widget

Conversation

@kim-em
Copy link
Copy Markdown
Contributor

@kim-em kim-em commented May 21, 2026

This PR teaches the info view to show the upstream label and description when the cursor sits on a @[stacks ...], @[kerodon ...], or @[wikidata ...] attribute. The fetch is on-demand via an RPC call from the widget to the Lean server, which shells out to curl; we deliberately do not fetch at attribute-elaboration time so offline builds stay clean and the only network access is when an editing session opens the panel.

Two new modules and a small refactor to the existing one:

  • Mathlib/Tactic/CrossRef/Fetch.lean (new) holds the Database enum and the snippet-fetch logic. The companion scripts/crossref-snippet.lean in the stacked PR reimplements the same logic standalone for CI usage; the two are kept in sync deliberately so the CI script doesn't need a Mathlib build.
  • Mathlib/Tactic/Widget/CrossRefHover.lean (new) defines the widget itself: an RpcEncodable props record, a RequestM RPC method that calls fetchSnippet, and a mk_rpc_widget% Component that renders the result as Html.
  • Mathlib/Tactic/CrossRefAttribute.lean (modified) imports the two new modules, removes the now-duplicated Database / databaseURL / databaseLabel definitions, and calls Widget.savePanelWidgetInfo in each attribute's add handler. Tag storage, docstring rewriting, and the trace commands are untouched, and the existing MathlibTest/CrossRefAttribute.lean suite continues to pass — none of the behaviour visible to it changes.

If curl is missing or the upstream site is unreachable, the widget renders an inline error instead of blocking the LSP.

🤖 Prepared with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

PR summary 63fd5cc7ed

Import changes for modified files

No significant changes to the import graph

Import changes for all files
Files Import difference
Mathlib.Tactic 2
Mathlib.Tactic.CrossRef.Fetch (new file) 54

Declarations diff

+ AttrBlock
+ Clause
+ CrossRefHoverPanel
+ CrossRefHoverPanel.rpc
+ CrossRefHoverProps
+ Database.fetch
+ Database.gerbyBase?
+ Database.ofName?
+ Database.ofString?
+ DeclHeader
+ DiffHunks
+ Mode
+ Pos
+ Record
+ Record.emit
+ Result
+ Result.emit
+ SnippetOutcome
+ WikidataParse
+ attachCrossRefWidget
+ byteSlice
+ cacheFile?
+ cacheLoad
+ cacheStore
+ collapseSignature
+ databases
+ declKeywords
+ extractFromFile
+ fetchGerbyAll
+ fetchSnippet
+ findDecl
+ gerbyBase?
+ gitDiffLeanFiles
+ modifierKeywords
+ parseArgs
+ parseClause
+ parseDiff
+ parseWikidataResponse
+ readIdent
+ readWord
+ recordInHunks
+ renderOutcome
+ splitAttrInstances
+ stepPos
+ takeIdent
+ takeString
+ takeTag
+ tsvEscape
+ utf8Slice
+ wikidataBatchSize
+ wikidataResultOf
++ Database.name
++ afterFirst?
++ fetchGerby
++ fetchUrl
++ fetchWikidata
++ flattenWhitespace
++ jsonStrPath?
++ main
++ parseGerbyTitle
++ stripHtml
++ takeUntilChar
++ usage
++ userAgent
+-+ Database

You can run this locally as follows
## from your `mathlib4` directory:
git clone https://github.com/leanprover-community/mathlib-ci.git ../mathlib-ci

## summary with just the declaration names:
../mathlib-ci/scripts/pr_summary/declarations_diff.sh <optional_commit>

## more verbose report:
../mathlib-ci/scripts/pr_summary/declarations_diff.sh long <optional_commit>

The doc-module for scripts/pr_summary/declarations_diff.sh in the mathlib-ci repository contains some details about this script.


No changes to strong technical debt.
No changes to weak technical debt.

⚠️ Scripts folder reminder

This PR adds files under scripts/.
Please consider whether each added script belongs in this repository or in leanprover-community/mathlib-ci.

A script belongs in mathlib-ci if it is a CI automation script that interacts with GitHub (e.g. managing labels, posting comments, triggering bots), runs from a trusted external checkout in CI, or requires access to secrets.

A script belongs in this repository (scripts/) if it is a developer or maintainer tool to be run locally, a code maintenance or analysis utility, a style linting tool, or a data file used by the library's own linters.

See the mathlib-ci README for more details.

Added scripts files:

  • scripts/crossref-snippet.lean
  • scripts/extract-crossref-tags.lean

@mathlib-dependent-issues mathlib-dependent-issues Bot added the blocked-by-other-PR This PR depends on another PR (this label is automatically managed by a bot) label May 21, 2026
@mathlib-dependent-issues
Copy link
Copy Markdown

This PR/issue depends on:

kim-em and others added 2 commits May 21, 2026 18:59
Add two standalone scripts to support review of PRs that add `@[stacks]`,
`@[kerodon]`, or `@[wikidata]` attributes:

* `scripts/crossref-snippet.lean` fetches a one-line label/description for
  one or more tags from the upstream database. Wikidata uses the
  `wbgetentities` API (with recovery when one bad QID poisons a batch);
  Stacks and Kerodon use the documented Gerby `/data/tag/<TAG>/content/statement`
  endpoint with a tolerant HTML→text strip that survives `<` inside math.
  Exit codes distinguish all-resolved (0) from missing (2) from transient
  network failure (3). When `CROSSREF_CACHE_DIR` is set, responses are
  memoised per `(database, tag)` so repeat lookups are instant.

* `scripts/extract-crossref-tags.lean` walks a `.lean` file (or the added
  lines of a git diff range) and emits TSV of every cross-reference
  attribute it finds, paired with the declaration it decorates. A
  byte-level scanner correctly skips string literals and line/block
  comments, handles multi-attribute blocks (`@[simp, stacks 01AB]`),
  multi-line attribute blocks, doc comments between attribute and
  declaration, and modifier keywords (`private`, `noncomputable`, …).
  Signatures are collapsed to one line for downstream consumption.

Both scripts run via `lake env lean --run` and depend only on Lean core
(plus `curl` on `PATH` for the fetcher), so no Mathlib build is required
and a future CI workflow can drive them on untrusted PR files without
elaborating PR code.

Companion follow-ups (separate PRs) will add an LSP widget that surfaces
these snippets in the editor and a GitHub Actions workflow that posts a
once-per-PR comment with tag / signature / snippet tables and fails CI
on unresolved tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a Mathlib declaration carries `@[stacks ...]`, `@[kerodon ...]`, or
`@[wikidata ...]`, the info view now shows the upstream label / description
when the cursor sits on the attribute. The fetch is on-demand via an RPC
call from the widget to the Lean server, which shells out to `curl`; we
deliberately don't fetch at attribute-elaboration time so offline builds
stay clean and the only network access is when an editing session opens
the panel.

Two new modules and a small refactor to the existing one:

* `Mathlib/Tactic/CrossRef/Fetch.lean` (new) holds the `Database` enum and
  the snippet-fetch logic. The companion `scripts/crossref-snippet.lean`
  in the previous PR reimplements the same logic standalone for CI usage;
  the two are kept in sync deliberately so the CI script doesn't need a
  Mathlib build.
* `Mathlib/Tactic/Widget/CrossRefHover.lean` (new) defines the widget
  itself: an `RpcEncodable` props record, a `RequestM` RPC method that
  calls `fetchSnippet`, and a `mk_rpc_widget%` Component that renders
  the result as Html.
* `Mathlib/Tactic/CrossRefAttribute.lean` (modified) imports the two new
  modules, removes the now-duplicated `Database` / `databaseURL` /
  `databaseLabel` definitions, and calls `Widget.savePanelWidgetInfo` in
  each attribute's `add` handler. Tag storage, docstring rewriting, and
  the trace commands are untouched.

If `curl` is missing or the upstream site is unreachable, the widget
renders an inline error instead of blocking the LSP. The existing
`MathlibTest/CrossRefAttribute.lean` suite continues to pass — none of
the behaviour visible to it changes.

Stacked on leanprover-community#39662
(scripts only). A third follow-up will add a GitHub Actions workflow
that posts a once-per-PR comment with tag / signature / snippet tables.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kim-em kim-em force-pushed the crossref-tooling-widget branch from d0f86f7 to 63fd5cc Compare May 21, 2026 19:30
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 21, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Three new pieces:

* `scripts/crossref-pr-comment.py` — pure-stdlib Python orchestrator that
  drives `extract-crossref-tags.lean` then `crossref-snippet.lean` per
  database, formats the Markdown comment body with a hidden HTML marker
  for in-place updates, and returns a 0/1/2/3 exit code so the workflow
  can distinguish "nothing to post" from "post but pass" from
  "post and fail" from "post but treat as a network blip."

* `.github/workflows/crossref_review.yml` — `pull_request_target` workflow
  with `pull-requests: write` (and only that). Installs elan, runs the
  orchestrator on the PR's diff range, finds the existing bot comment via
  `gh api … --jq` matching the marker, and `POST`s / `PATCH`es / `DELETE`s
  through the GitHub API. A startup gate exits cleanly when the branch
  pre-dates the companion scripts (so the workflow doesn't fail on PRs
  to an older base).

* `docs/workflows.md` — new entry in the workflow inventory, matching
  the existing row format.

Stacked on leanprover-community#39664
which adds the LSP widget side, which itself depends on
leanprover-community#39662 (the scripts).

Security note: the workflow runs the PR's own version of the scripts in
the privileged `pull_request_target` context. The exposed token is
GITHUB_TOKEN with only `pull-requests: write`, and the scripts only read
.lean files and shell out to `curl` against three fixed upstream APIs, so
the blast radius of a malicious script edit is limited to posting / editing
PR comments. If maintainers prefer the trusted-only execution pattern used
by `PR_summary.yml`, the orchestrator can move to mathlib-ci in a
follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +83 to +89
private def fetchUrl (url : String) : IO (Nat × String) := do
let output ← IO.Process.output {
cmd := "curl"
args := #["-sSL", "--max-time", "10", "-A", userAgent,
"-w", "\n%{http_code}", url]
}
if output.exitCode != 0 then return (0, "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Std.Http here instead? Or do the query javacsript-side?

Comment on lines +83 to +96
private def fetchUrl (url : String) : IO (Nat × String) := do
let output ← IO.Process.output {
cmd := "curl"
args := #["-sSL", "--max-time", "10", "-A", userAgent,
"-w", "\n%{http_code}", url]
}
if output.exitCode != 0 then return (0, "")
let parts := output.stdout.splitOn "\n"
match parts.reverse with
| last :: rest =>
let body := "\n".intercalate rest.reverse
let status := last.trimAscii.toString.toNat?.getD 0
return (status, body)
| [] => return (0, "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting the whole string then rejoining seems pretty silly here;

Comment on lines +65 to +73
/-- The outcome of trying to fetch a snippet. -/
public inductive SnippetOutcome where
/-- Upstream returned a (title, description). Either may be empty. -/
| ok (title : String) (description : String)
/-- The tag was authoritatively missing upstream. -/
| missing
/-- A transient problem (network, parse, …). -/
| network (reason : String)
deriving Repr
Copy link
Copy Markdown
Member

@eric-wieser eric-wieser May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clearer as

structure Snippet where
  title : String
  description : String

inductive SnippetError where
  | network (reason : String)

abbrev SnippetOutcome := Except SnippetError (Option Snippet)

which lets you use Except machinery.

Comment on lines +110 to +133
/-- Best-effort HTML→text. We treat `<x…>` as a tag only when `x` is a letter,
`/`, or `!`, so a literal `<` inside LaTeX (`0 < 1`) is preserved. -/
private def stripHtml (html : String) : String :=
let chars := html.toList
let rec go : List Char → Bool → String → String
| [], _, acc => acc
| '<' :: rest, false, acc =>
match rest with
| c :: _ =>
if c.isAlpha || c == '/' || c == '!' then go rest true acc
else go rest false (acc.push '<')
| [] => acc.push '<'
| '>' :: rest, true, acc => go rest false acc
| _ :: rest, true, acc => go rest true acc
| c :: rest, false, acc => go rest false (acc.push c)
let raw := go chars false ""
let decoded := raw
|>.replace "&nbsp;" " "
|>.replace "&amp;" "&"
|>.replace "&lt;" "<"
|>.replace "&gt;" ">"
|>.replace "&quot;" "\""
|>.replace "&#39;" "'"
flattenWhitespace decoded
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this when the infoview can display HTML?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked-by-other-PR This PR depends on another PR (this label is automatically managed by a bot)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants