Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .changeset/issue-128-links-notation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
'meta-expression': minor
---

Address the issue #128 Translate-page report for "Hawaii is a state." (en→ru):

- Surface a Wikipedia link for English phrases (matching the Russian behaviour)
by defaulting the Translate page link target to Wikipedia and resolving
language-specific `*.wikipedia.org` URLs from Wikidata sitelinks.
- Render a merged entity definition in links notation that cross-references the
senses matched across Wikidata, Wikipedia, and Wiktionary so words can be
disambiguated from a single artefact.
- Report the elected contexts in both the formalization and translation links
notation in priority order with the exact computed probability and the number
of source words sharing each context.
- Trim the "Report Notes" and "Reproduction Steps" sections from generated issue
reports to leave more room for diagnostic data.
- Fix the GitHub Pages app version readout so it reflects the deployed package
version instead of always showing "unknown".
- Consolidate the recorded Wikimedia API request/response snapshots into a
deterministic `js/data/wikimedia-cache.lino` cache (regenerated by
`npm run cache:refresh`, verified by `npm run cache:check`) and seed the web
app's in-memory cache from the deployed file so the top-viewed-articles quality
gate replays offline in both Node and the browser.
16 changes: 16 additions & 0 deletions docs/FORMALIZE.md
Original file line number Diff line number Diff line change
Expand Up @@ -631,6 +631,22 @@ overrides files so the document is always rooted under a stable name.

**Returns** `string`

### `parseLinoCacheEntries(text)`

Decode a `.lino` API request/response cache into a `Map<url, value>`.

The cache stores one entry per request as `{ key, url, response }` where the
response is a verbatim JSON string (issue #128). Parsing is browser-safe — it
only depends on the codec above — so the same decoder seeds the Node quality
gate and the web app's persistent cache. Malformed entries are skipped so a
partially-corrupt cache still yields whatever decoded cleanly.

| Parameter | Type | Description |
| --------- | -------- | ----------- |
| `text` | `string` | — |

**Returns** `Map<string, unknown>`

## Doublets binary store

Source: [`js/src/doublets.js`](../js/src/doublets.js)
Expand Down
78 changes: 78 additions & 0 deletions docs/case-studies/issue-128/ONLINE-RESEARCH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Issue 128 — Online Research & Existing Components

This records the existing components, libraries, and external facts considered
before implementing the fix, and the basis for the R9 conclusion (no upstream
issue is warranted).

## How the pieces fit today

- **Version readout.** The GitHub Pages build (`.github/workflows/js.yml`)
generates `_site/web/app-version.json` with the package version, commit SHA,
ref, and build time. `web/app-version.js` already exposes
`loadAppVersionInfo()` / `formatAppVersion()` to read it; the Translate debug
log simply was not using it.
- **Link target resolution.** The translation path resolves a language-specific
`*.wikipedia.org` article from the Wikidata entity's **sitelinks**, falling
back to the Wikidata entity URL. This is the same mechanism the English side
now uses when the Wikipedia link target is selected.
- **Links notation.** `js/src/formalize-renderers.js` and
`js/src/translation-renderers.js` are the single rendering path shared by the
Formalize page, Translate page, CLI, and server.
- **Snapshot store.** `js/src/formalize-snapshots.js` (issue #21) records
Wikimedia API responses as one JSON file per URL so tests replay offline. The
quality gate over the top-viewed articles (issue #43 / #96) uses it.

The project's runtime dependencies (`package.json`) are `doublets-web`,
`links-notation`, and `lino-arguments` — none of which do version display,
report generation, entity linking, or HTTP caching. All of the affected logic is
ours.

## Fact 1 — `.lino` is the project's house format for committed data

The repository already stores overrides and snapshot manifests in indented Links
Notation via the `links-notation` package (`js/src/lino.js`). The issue's
"in .lino format, as we usually do" points at this codec. Two facts shaped the
cache design:

- The generic codec **round-trips empty objects lossily** (`{}` decodes back to
`null`). Wikidata/Wiktionary payloads contain empty `aliases: {}` and similar,
so the cache stores each response as a **verbatim JSON string** under a
`response` key — the URL and key stay human-auditable while the payload is
lossless.
- A committed cache is only useful as a freshness gate if regeneration is
**deterministic**. The serializer sorts entries by URL and emits **no
wall-clock timestamp**, so `node scripts/refresh-wikimedia-cache.mjs --check`
is a meaningful git-diff check.

## Fact 2 — GitHub Pages already deploys `js/data`

`.github/workflows/js.yml` copies `js/data` into the published site
(`cp -R js/data _site/js/`). Putting the cache at `js/data/wikimedia-cache.lino`
therefore ships it to the web app with no extra workflow step, which is what
makes "executing the same test in web app … will be faster" achievable: the web
app seeds its in-memory cache from the deployed file.

## Fact 3 — Wikidata sitelinks are the canonical Wikipedia-link source

Resolving `Q782` → `en.wikipedia.org/wiki/Hawaii` uses the entity's `sitelinks`
(`enwiki`, `ruwiki`, …), the standard Wikibase mechanism for mapping an entity to
its per-language Wikipedia article. There is no article in some languages for
some entities, so a Wikidata-entity fallback is required — hence the "Wikipedia
(fallback Wikidata)" wording rather than an unconditional rewrite.

## R9 — Upstream issue assessment

**Conclusion: no external issue is warranted.**

- The `App version: unknown` readout was a missing read of our own
`app-version.json`, not a dependency bug.
- The redundant report sections were generated by our own `page-report.js`.
- The asymmetric link targets, the un-merged links notation, and the missing
context lines are all in our own renderers.
- The cache format and freshness are entirely ours; `links-notation` behaves as
documented (the empty-object round-trip is a known property we design around,
not a defect to report).

If a future failure traces to incorrect Wikidata data (e.g. a missing sitelink),
the right venue is editing that item on wikidata.org, not a code issue against a
dependency. No such data defect was found for "Hawaii is a state."
68 changes: 68 additions & 0 deletions docs/case-studies/issue-128/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Issue 128 Case Study

Issue: https://github.com/link-assistant/meta-expression/issues/128

PR: https://github.com/link-assistant/meta-expression/pull/129

## Summary

A bug report filed from the Translate page (`web/#/translate`) on `v0.9.0`
(commit `8b336fb`, build time `2026-05-28T10:16:32Z`) translating the sentence
**"Hawaii is a state."** from English to Russian, with all three sources
(Wikipedia, Wikidata, Wiktionary) enabled. The translation itself was correct
(`Гавайи это штат.`), but the report bundled several presentation and
infrastructure problems:

1. **The version never increases.** The deployed app was stuck on `v0.9.0` "for
a long time", and the debug log printed `App version: unknown`.
2. **Two report sections waste space.** The generated issue carries a
`## Report Notes` and a `## Reproduction Steps` block that add no diagnostic
value and crowd out the room available in the URL-bounded report.
3. **English shows no Wikipedia link.** The Russian output links
`Гавайи` to `ru.wikipedia.org`, but the English formalization links `Hawaii`
only to `wikidata.org/wiki/Q782` — "I think we should show a link to
wikipedia, if we have it".
4. **Links notation should merge the entity definition** — show everything that
matched the term across Wikipedia, Wikidata, and Wiktionary so the sources
cross-reference each other and help disambiguate.
5. **Links notation should show the selected contexts** — in priority order
(highest probability first), with the exact probability we computed and how
many source words share each context, in both the formalization and the
translation.
6. **The quality gate must run on the top-most-viewed articles** and, when
merged to main, **refresh an API request/response cache** so the same test
(and the web app) replays offline and faster next time. "Cache should be in
data folder and in .lino format, as we usually do."
7. **Compile this case study** and do a deep analysis (timeline, requirements,
root causes, solution plans, online research).

This folder reconstructs the timeline, enumerates every requirement, performs
root-cause analysis backed by the captured debug log, proposes solution plans,
and records the online/library research that informed the fix.

## Files

- [`data/issue.json`](data/issue.json) — the raw issue body + metadata as
captured from the GitHub API.
- [`data/debug-log.md`](data/debug-log.md) — the debug log embedded in the issue
body, preserved verbatim as research data.
- [`REQUIREMENTS.md`](REQUIREMENTS.md) — the complete, numbered requirement list
distilled from the issue body.
- [`TIMELINE.md`](TIMELINE.md) — reconstructed sequence of events.
- [`ROOT-CAUSES.md`](ROOT-CAUSES.md) — per-problem root-cause analysis with
evidence quoted from the debug log.
- [`SOLUTION-PLAN.md`](SOLUTION-PLAN.md) — solution plan for each requirement and
what this PR actually changed.
- [`ONLINE-RESEARCH.md`](ONLINE-RESEARCH.md) — existing components, libraries,
and external facts considered.

## Reproduction

The smallest reproduction is the sentence itself, captured as automated tests in
`js/tests/integration/issue-128.test.js` (Wikipedia link, merged definition, and
contexts in both the formalization and translation links notation) and
`js/tests/integration/issue-128-cache.test.js` (the `.lino` API cache round-trip,
freshness, and an offline replay of "Hawaii is a state."). To reproduce the
original report end-to-end, paste `Hawaii is a state.` into the Translate page
(`en` → `ru`, all three sources enabled), pick the **Wikipedia** link target, and
inspect the formalization links notation and the copied debug log.
156 changes: 156 additions & 0 deletions docs/case-studies/issue-128/REQUIREMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Issue 128 — Requirements

Every requirement extracted from the issue body
([`data/issue.json`](data/issue.json)), numbered for traceability. Each entry
notes the status in PR #129.

## R1 — Fix the version auto-bump (it shows "unknown")

> We still have 0.9.0 version for a long time for some reason auto bump does not
> work for our GitHub Pages app.

The debug log printed `App version: unknown`. Two distinct problems hide here:
the **display** of the version in the Translate debug log, and the **release
pipeline** that bumps `package.json`.

**Status: done (display) / verified (pipeline).** `web/translate-ui.js` now
resolves the deployed build via `loadAppVersionInfo()` / `formatAppVersion()`
(from `web/app-version.js`) and prints it in the debug log instead of the
placeholder. The release-pipeline gate that previously blocked the bump was
already corrected for issue #126 (decoupling the version bump from
`NPM_PUBLISH_ENABLED`); this PR adds the changeset that the fixed pipeline needs
to actually raise the number on the next merge to `main`. See
[`ROOT-CAUSES.md`](ROOT-CAUSES.md) RC1.

## R2 — Remove the "Report Notes" and "Reproduction Steps" sections

> These sections from issue reporting can be removed, giving more space for
> other data.

The `## Report Notes` and `## Reproduction Steps` blocks add no diagnostic value
and consume characters in the URL-length-bounded report.

**Status: done.** `web/page-report.js` no longer emits either section
(`createReproductionSteps`, `compactReportNotice`, and `omittedDiagnosticHeadings`
removed); the `notices`/`reproductionSteps` plumbing is gone.

## R3 — Show a Wikipedia link for English when one exists

> For ru we use Wikipedia link, but for en we don't have … I think we should
> show a link to wikipedia, if we have it.

English `Hawaii` linked only to `wikidata.org/wiki/Q782`, while Russian `Гавайи`
linked to `ru.wikipedia.org`.

**Status: done.** The Translate page's **Link target** now defaults to
**Wikipedia (fallback Wikidata)** (`web/index.html`, `web/i18n.js`), so English
phrases resolve to their `en.wikipedia.org` article from the Wikidata sitelinks
when one exists and fall back to the Wikidata entity otherwise. Verified by
`issue-128.test.js` → "renders an English Wikipedia link when the Wikipedia
target is selected (R3)".

## R4 — Merged entity definition in links notation

> when showing in links notation we should use merged entity definition, that
> should show everything that matched the term from wikipedia, wikidata, and
> wiktionary, so these can be used to cross reference each other and help with
> disambiguating words.

**Status: done.** `renderMergedDefinitionLines` / `mergeEntityDefinition` in
`js/src/formalize-renderers.js` emit, per phrase, a `…-definition` summary line
that unions the Wikidata / Wikipedia / Wiktionary links for the selected sense,
followed by one `…-sense-N` line per matched candidate (source, id, label, kind,
score, selected flag, link). Verified by `issue-128.test.js` → "exposes a merged
entity definition for each phrase (R4)".

## R5 — Show selected contexts in links notation (priority + probability)

> we should show in links notation version of formalization and translation
> which contexts was selected (with sequence/order of priority, where first
> context should have high probability). We should also show exact number of
> probability of context we calculated, based on how many words in the same
> context we have in source text.

**Status: done.** `renderContextLines` in `js/src/formalize-renderers.js` emits
one line per selected context, sorted by probability descending, carrying
`priority N`, the exact `probability` (one decimal percent), the `weight`, the
number of `words` sharing the context, and the shared words themselves. The same
lines are appended to the translation links notation via
`js/src/translation-renderers.js`. Verified by `issue-128.test.js` → "reports
selected contexts with priority and probability (R5)" and "carries the contexts
into the translation links notation (R5)".

## R6 — Quality gate on top articles + refreshable `.lino` API cache

> make sure we actually execute quality check on top most viewed articles, and
> when merged in main branch, that test should update the cache, so executing
> the same test in web app or next time will be faster, as we cache
> requests/responses to APIs … Cache should be in data folder and in .lino
> format, as we usually do.

**Status: done.** The quality gate over the top-viewed Wikipedia articles
already exists (`js/tests/e2e/issue-43-*`, `js/tests/integration/issue-96-*`,
asserting `summary.failed === 0` against curated fixtures). This PR adds the
**data-folder `.lino` API cache**: `js/data/wikimedia-cache.lino` consolidates
the recorded request/response snapshots into one deterministic
`cache > entries` document; `scripts/refresh-wikimedia-cache.mjs`
(`npm run cache:refresh` / `cache:check`, wired into `npm run check`) regenerates
it offline and fails CI on drift; `web/persistent-cache.js` seeds the web app's
in-memory cache from the deployed file so the web app replays offline too. The
GitHub Pages build already copies `js/data` into the published site. Verified by
`issue-128-cache.test.js`. See [`ROOT-CAUSES.md`](ROOT-CAUSES.md) RC6 and
[`SOLUTION-PLAN.md`](SOLUTION-PLAN.md).

## R7 — Compile this case study with deep analysis

> make sure we compile that data to ./docs/case-studies/issue-{id} … reconstruct
> timeline/sequence of events, list of each and all requirements …, find root
> causes …, and propose possible solutions and solution plans … (we should also
> check known existing components/libraries …, also make sure to search online
> for additional facts).

**Status: done.** This folder.

## R8 — Add debug output / verbose mode if data is insufficient

> If there is not enough data to find actual root cause, add debug output and
> verbose mode if not present, that will allow us to find root cause on next
> iteration.

**Status: done.** The root cause of R1's display bug was directly visible in the
captured log (`App version: unknown`), so no extra tracing was needed there. The
debug log already surfaces context detection and per-word candidates (added for
issue #126); the merged-definition and context lines (R4/R5) further expose how
each phrase was disambiguated, in both the UI links notation and any future
report.

## R9 — Report related issues to other repositories

> If issue related to any other repository/project, where we can report issues
> on GitHub, please do so. Each issue must contain reproducible examples,
> workarounds and suggestions for fix the issue in code.

**Status: assessed — none warranted.** See
[`ONLINE-RESEARCH.md`](ONLINE-RESEARCH.md). Every defect is in this repository
(version display, report generation, links-notation rendering, the cache
format); none traces to an upstream dependency or external data error.

## R10 — Apply the fix across the entire codebase

> double check to fully apply requirements to entire codebase, so if we have
> issue in multiple places, it should be fixed in all them.

**Status: done.** The links-notation renderers (`formalize-renderers.js`,
`translation-renderers.js`) are shared by every entry point (Formalize page,
Translate page, CLI, server), so the merged definition and context lines (R4/R5)
appear everywhere links notation is produced. The Formalize page already defaults
its link target to Wikipedia, so R3 needed changing only on the Translate page.
The report change (R2) lives in the single shared `page-report.js`.

## R11 — Do everything in one pull request (#129)

> Please plan and execute everything in this single pull request … until each
> and every requirement fully addressed, and everything is totally done.

**Status: in progress.** All work lands on branch `issue-128-41095a0f356d` /
PR #129, with a single changeset (`.changeset/issue-128-links-notation.md`).
Loading
Loading