Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Article-Generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,16 @@ For each of the 14 supported languages, the renderer resolves the `<title>` and

Authoritative editorial rule: **localized title and description are highlights of the localized executive brief**. The runtime enforcement of this rule lives in [`scripts/render-lib/aggregator/seo/localized-brief.ts`](scripts/render-lib/aggregator/seo/localized-brief.ts) — a pure-function bounded context whose `extractLocalizedBriefSeo({ briefMarkdown, subfolder })` returns `{ title, description }` candidates derived from `executive-brief_<lang>.md` H1 + BLUF. Banned-phrase H1s (`REPLACE THIS H1`, `Executive Brief Template`, `AI_MUST_REPLACE`, `AI-generated political intelligence`) and bare boilerplate `Executive Brief` are rejected in lock-step with `scripts/agentic/analysis-gate.ts § checkExecutiveBrief`, so a translator stub cannot leak into the SERP `<title>` via the localized cascade. The merger in [`scripts/render-lib/article-merge.ts`](scripts/render-lib/article-merge.ts) overlays the brief-derived fields on top of the per-type agent's `article.<lang>.md` front-matter; each field is independent, so a clean BLUF localizes the description even when the brief H1 is rejected. This is enforced by `scripts/validate-executive-brief-translations.ts` for the brief itself and by `tests/localized-brief-seo.test.ts` + `tests/article-merge.test.ts` for the rendered HTML.

#### On-page lead localization — the body opens in the reader's language

The `<title>`/`<meta description>` cascade above governs **metadata**. The same "localized brief if it exists, English otherwise" rule is applied to the **on-page lead** — the first section a reader actually sees — by [`scripts/render-lib/article-brief-lead.ts`](scripts/render-lib/article-brief-lead.ts), a pure (no-I/O) markdown transform `renderArticleHtml` runs once per target language:

1. **Carrier stripping (all languages).** Because the aggregator splices *every* `.md` sibling into `article.md`, the 13 localized briefs are embedded as trailing `## Executive Brief Sv`, `## Executive Brief De`, … carrier sections. `stripEmbeddedLocalizedBriefSections()` removes **all** of them for every language so no page ships 13 foreign-language brief copies inline (this dropped a representative proposition page from ~428 KB to ~307 KB). The carriers are functionally dead weight in the HTML — the SEO cascade reads `executive-brief_<lang>.md` from disk, not from these embedded copies.
2. **Lead swap (non-English with a localized brief).** For a non-English target whose `executive-brief_<lang>.md` exists, the body of the first `<h2>` lead section (`## What Happened`) is replaced with the cleaned localized brief (same `cleanArtifactBody()` + `rewriteRelativeLinks()` pipeline the aggregator uses — `normalizeNarrativeTerminology()` is deliberately **not** run so English first-use glosses never leak into localized prose). The lead **heading** stays the language-stable English `## What Happened` (the in-article TOC localizes its label separately, per `section-title-i18n.ts`); only the lead **body** becomes localized. The lead's provenance comment is repointed from `executive-brief.md` to `executive-brief_<lang>.md`.
3. **English / missing / empty fallback.** English keeps its canonical `## What Happened` lead verbatim. A non-English target with no localized brief (or a whitespace-only one) also keeps the English lead — the page renders English lead content under a non-English `<html lang>` until the next `news-translate` run produces the localized brief, exactly mirroring metadata fallback layer #4.

Localized briefs are also excluded from the artifact list (`resolveArtifactList()` in `scripts/render-articles.ts` and `isReaderGuideEligible()` in `aggregator/reader-guide.ts`) so they no longer appear as Reader Intelligence Guide navigation rows (which would dangle now that the carrier sections are stripped), Article Sources provenance cards, or JSON-LD `isBasedOn` — they are translations of `executive-brief.md`, not independent analytical artifacts. This is covered by `tests/article-brief-lead.test.ts`.

#### `article.md` front-matter (canonical English source)

`scripts/render-lib/aggregator/aggregate.ts` writes the front-matter that the renderer subsequently consumes:
Expand Down
17 changes: 16 additions & 1 deletion analysis/methodologies/ai-driven-analysis-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,21 @@ Before article aggregation, every workflow must seed the story metadata that wil
3. Add `## 🌐 14-Language SEO Metadata Seeds` to the executive brief when the workflow has enough evidence. Fill all 14 rows with short localized title angles, description angles and keyword seeds. If a language cannot be human-quality localized, write the English story topic plus the language label and mark it `[machine-assisted — verify]`; do not leave it blank.
4. Pass 2 must read every language row back and confirm that the title/description is **contextual** (policy object, actor, consequence) rather than a date-stuffed duplicate.

> **The H1 is doing triple duty.** The `executive-brief.md` H1 and BLUF are not "just SEO". The renderer derives the page `<title>`, `<meta description>`, OG/Twitter cards and Schema.org `headline` from them **and** the brief body is now the **on-page lead** a reader meets first. For a localized page the renderer uses `executive-brief_<lang>.md` as the lead (localized if it exists, English otherwise — the same cascade as `<title>`/`<meta description>`), so the localized H1/BLUF must read as a finished, reader-attracting opening in that language, not a literal calque of the English. Write each H1 to earn the click and the first 10 seconds of reading attention.

#### 🪝 Headline tradecraft (write the title to be clicked, not just indexed)

Pass-1 may write a serviceable H1; Pass-2 must sharpen it. A strong title is **specific, consequential and curiosity-opening** without being clickbait:

- **Lead with the named actor and an active, concrete verb** — "Busch government tightens migration detention" beats "New propositions submitted".
- **Name the stake / so-what** — quantify or sharpen the consequence ("…risking ECHR Article 8 review", "…shifting 12% of the housing budget").
- **Front-load the highest-DIW finding**, never the document count or the date. The title must match the #1 finding the lede and significance scoring agree on.
- **Stay inside the SERP-safe envelope** — 50–70 characters renders without truncation (the renderer trims long H1s at a word boundary and strips trailing connectors, so a title that *needs* its tail to make sense will lose meaning at ~70 chars).
- **No date-stuffing, no admin metadata, no template scaffolding** (`REPLACE THIS H1`, `Executive Brief`, classification badges) — those are stripped/flagged and produce duplicate-looking SERP entries.
- **Localize the angle, not the words** — each non-English title row should carry the same actor/verb/stake in idiomatic phrasing; a reader scanning the SV or JA page should feel the headline was written for them.

Mini self-check (apply to every language row in Pass-2): *Does it name an actor? An active verb? A concrete stake? Would a non-expert understand the so-what in one read? Is it ≤ 70 chars and free of dates/admin text?* Any "no" forces a rewrite.

Minimum row schema:

```markdown
Expand Down Expand Up @@ -406,7 +421,7 @@ Score your own output against this rubric before commit:

#### Article and SEO handoff

Before running `scripts/aggregate-analysis.ts`, ensure `executive-brief.md` has a publishable H1 and BLUF that can become `<title>` and `<meta description>` without repair: actor-first, active verb, no literal date, no admin metadata, 55–70 character title target and 140–200 character one-sentence description target. `synthesis-summary.md §Narrative Direction & Article Decision` should agree with that H1/BLUF so `article.md` reads as one coherent intelligence article.
Before running `scripts/aggregate-analysis.ts`, ensure `executive-brief.md` has a publishable H1 and BLUF that can become `<title>` and `<meta description>` without repair: actor-first, active verb, no literal date, no admin metadata, 55–70 character title target and 140–200 character one-sentence description target. `synthesis-summary.md §Narrative Direction & Article Decision` should agree with that H1/BLUF so `article.md` reads as one coherent intelligence article. Apply the same bar to every `executive-brief_<lang>.md`: its H1/BLUF is the localized page's **on-page lead** (the first thing a SV/DE/JA/AR reader sees) as well as that page's localized `<title>`/`<meta description>` — so each localized brief must open with the same reader-attracting, actor-and-stake headline in idiomatic prose, never a literal calque. Run the 🪝 Headline-tradecraft self-check (Step 2B) against every language before aggregation.


Read every file you produced in Steps 3–5. For each one, **improve every section**:
Expand Down
8 changes: 8 additions & 0 deletions scripts/render-articles.ts
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,14 @@ function resolveArtifactList(rc: RenderCase): readonly string[] {
if (rel === 'pass1' || rel.startsWith('pass1/')) continue;
walk(full, rel);
} else if (/\.(md|json)$/i.test(e.name) && !/^article(?:\.[a-z-]+)?\.md$/i.test(e.name)) {
// Skip localized executive-brief translation carriers
// (`executive-brief_<lang>.md`). They are translations of the
// English `executive-brief.md` — consumed by the SEO cascade and
// the localized on-page lead — not independent analytical
// artifacts, so they must not appear in the Reader Intelligence
// Guide, the Article Sources provenance grid, or JSON-LD
// `isBasedOn`.
if (/^executive-brief_[a-z-]+\.md$/i.test(e.name)) continue;
out.push(rel);
}
}
Expand Down
5 changes: 5 additions & 0 deletions scripts/render-lib/aggregator/reader-guide.ts
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,11 @@ export function selectReaderGuideArtifacts(available: ReadonlySet<string> | read
const base = file.includes('/') ? file.slice(file.lastIndexOf('/') + 1) : file;
if (base === 'README.md') return false;
if (/^article(?:\.[a-z-]+)?\.md$/i.test(base)) return false;
// Localized executive-brief translation carriers are derivative of
// `executive-brief.md` (which already represents the lead). They are
// not independent analytical sections, so they must never appear as
// Reader Intelligence Guide navigation rows.
if (/^executive-brief_[a-z-]+\.md$/i.test(base)) return false;
return true;
};

Expand Down
185 changes: 185 additions & 0 deletions scripts/render-lib/article-brief-lead.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
/**
* @module Infrastructure/RenderLib/ArticleBriefLead
* @category Intelligence Operations / Supporting Infrastructure
* @name Localized executive-brief lead substitution + carrier stripping
*
* @description
* The aggregated `analysis/daily/$DATE/$SUB/article.md` is an
* English-canonical document. It opens with the English executive brief
* (the `## What Happened` lead section) and — because the aggregator
* splices *every* `.md` sibling into the body — also embeds the 13
* localized briefs (`executive-brief_<lang>.md`) as trailing
* `## Executive Brief Sv`, `## Executive Brief De`, … carrier sections.
*
* Those carrier sections were never meant to render inline: they bloat
* every published page (each carries the full brief in a foreign language)
* and they leave a non-English reader meeting the *English* lead before
* their own-language summary. The SEO cascade already localizes the
* `<title>` / `<meta description>` from `executive-brief_<lang>.md`
* (see `aggregator/seo/localized-brief.ts`); this module brings the
* on-page **lead** into lock-step with that cascade.
*
* {@link localizeExecutiveBriefLead} is a pure (no-I/O) string transform
* applied by `renderArticleHtml` to the article-markdown body per target
* language. It:
*
* 1. removes every embedded `## Executive Brief <Lang>` carrier section
* for **all** languages (English included); and
* 2. for a non-English target with a localized brief, replaces the body
* of the first `<h2>` lead section (`## What Happened`) with the
* cleaned `executive-brief_<lang>.md` content so the reader's first
* screen is entirely in their own language. When the localized brief
* is absent, the English lead is left in place (the same "localized
* if exists, English otherwise" rule the SEO cascade follows).
*
* The localized body is cleaned with the **same** pipeline the aggregator
* uses for the carrier sections — `cleanArtifactBody` (front-matter / H1 /
* admin-byline strip + `##` → `###` heading demotion) followed by
* `rewriteRelativeLinks` — so the swapped-in lead is byte-identical to
* what the aggregator would have embedded. Crucially it does **not** run
* `normalizeNarrativeTerminology`, whose English first-use annotations
* (`Riksdag document #… (HD…)`, `Lede`, confidence glosses) must never be
* injected into localized prose.
*
* @author Hack23 AB (Infrastructure Team)
* @license Apache-2.0
*/

import type { Language } from '../types/language.js';
import { LANGUAGES } from './constants.js';
import { buildGithubBlobUrl } from './url-helpers.js';
import {
cleanArtifactBody,
rewriteRelativeLinks,
} from './aggregator/cleaning/structural.js';

/**
* Title-cased single-segment language codes for the 13 non-English
* locales, matching `prettifyFallbackTitle('executive-brief_<lang>.md')`
* in `aggregator/order.ts` (e.g. `sv` → `Sv`, `no` → `No`, `zh` → `Zh`).
* English is excluded — its brief renders as `## What Happened`, never as
* a `## Executive Brief <Lang>` carrier.
*/
const LOCALIZED_BRIEF_TITLE_SUFFIXES: readonly string[] = LANGUAGES
.filter((l) => l !== 'en')
.map((l) => l.charAt(0).toUpperCase() + l.slice(1));

/**
* Matches an embedded `## Executive Brief <Lang>` carrier section: the
* heading line through every following line up to (but excluding) the
* next `<h2>`. Mirrors the line-anchored sweep used by
* `stripBodyDuplicateSections` so `###`/`# `/code-fence lines inside the
* section are consumed while the next `## ` boundary stops the match.
*/
const EMBEDDED_BRIEF_SECTION_RE = new RegExp(
String.raw`^##\s+Executive Brief (?:${LOCALIZED_BRIEF_TITLE_SUFFIXES.join('|')})\b[^\n]*\n(?:(?!^##\s)[^\n]*\n?)*`,
'gim',
);

/**
* Strip all embedded `## Executive Brief <Lang>` carrier sections from an
* article-markdown body. Applied for every language, English included.
*/
export function stripEmbeddedLocalizedBriefSections(content: string): string {
const stripped = content.replace(EMBEDDED_BRIEF_SECTION_RE, '');
// Collapse the blank-line run left where the carrier block used to sit.
return `${stripped.replace(/\n{3,}/g, '\n\n').trimEnd()}\n`;
}

interface LeadBounds {
readonly headingLine: string;
readonly firstH2: number;
readonly secondH2: number;
}

/** Locate the first and second `## ` (h2) line indices in a markdown body. */
function findLeadBounds(lines: readonly string[]): LeadBounds | null {
let firstH2 = -1;
let secondH2 = -1;
for (let i = 0; i < lines.length; i += 1) {
if (/^##\s/.test(lines[i]!)) {
if (firstH2 === -1) {
firstH2 = i;
} else {
secondH2 = i;
break;
}
}
}
if (firstH2 === -1) return null;
return { headingLine: lines[firstH2]!, firstH2, secondH2 };
}

/**
* Replace the body of the first `<h2>` lead section with `localizedBody`,
* keeping the original heading and repointing the provenance comment at
* `executive-brief_<lang>.md`.
*/
function replaceLeadSectionBody(
content: string,
lang: Language,
localizedBody: string,
subfolderRepoRelPath: string,
): string {
const lines = content.split('\n');
const bounds = findLeadBounds(lines);
if (!bounds) return content;

const sourceRel = `executive-brief_${lang}.md`;
const sourceUrl = subfolderRepoRelPath
? buildGithubBlobUrl(`${subfolderRepoRelPath}/${sourceRel}`)
: sourceRel;

const before = lines.slice(0, bounds.firstH2);
const after = bounds.secondH2 === -1 ? [] : lines.slice(bounds.secondH2);

const leadBlock = [
bounds.headingLine,
`<!-- source: ${sourceRel} :: ${sourceUrl} -->`,
'',
localizedBody.trim(),
'',
];

return [...before, ...leadBlock, ...after].join('\n');
}

export interface LocalizeExecutiveBriefLeadInput {
/** Article-markdown body (front-matter already removed). */
readonly content: string;
/** Target language. */
readonly lang: Language;
/** Raw `executive-brief_<lang>.md` markdown when one exists on disk. */
readonly localizedBriefMarkdown?: string;
/** Repo-relative analysis folder, used to rewrite relative links. */
readonly subfolderRepoRelPath?: string;
}

/**
* Localize the on-page executive-brief lead and strip embedded carrier
* sections. See the module JSDoc for the full contract.
*/
export function localizeExecutiveBriefLead(
input: LocalizeExecutiveBriefLeadInput,
): string {
const stripped = stripEmbeddedLocalizedBriefSections(input.content);

// English keeps the canonical `## What Happened` lead verbatim.
if (input.lang === 'en') return stripped;

const brief = input.localizedBriefMarkdown;
if (!brief || brief.trim().length === 0) return stripped;

const cleaned = rewriteRelativeLinks(
cleanArtifactBody(brief),
input.subfolderRepoRelPath ?? '',
);
if (cleaned.trim().length === 0) return stripped;

return replaceLeadSectionBody(
stripped,
input.lang,
cleaned,
input.subfolderRepoRelPath ?? '',
);
}
8 changes: 7 additions & 1 deletion scripts/render-lib/article.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ import {
} from './article-aside.js';
import { enrichArticleMarkdownWithPoliticalContext } from './political-context.js';
import { applyScannabilityTransforms, transformProgressiveDisclosure } from './article-scannability.js';
import { localizeExecutiveBriefLead } from './article-brief-lead.js';

/**
* CSS selectors identifying the voice-assistant TTS-readable regions of
Expand Down Expand Up @@ -544,7 +545,12 @@ export async function renderArticleHtml(input: RenderArticleInput): Promise<stri
const modifiedIso = new Date().toISOString();
const articleType = { type: articleTypeId, label: localizedArticleTypeLabel };

const cleanedContent = stripBodyDuplicateSections(parsed.content);
const cleanedContent = localizeExecutiveBriefLead({
content: stripBodyDuplicateSections(parsed.content),
lang: input.lang,
localizedBriefMarkdown: input.localizedBriefMarkdown,
subfolderRepoRelPath: input.subfolderRepoRelPath,
});

const enrichedContent = enrichArticleMarkdownWithPoliticalContext(cleanedContent, input.lang);

Expand Down
Loading
Loading