case · case · May 28, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
diff --git a/README.md b/README.md
@@ -223,6 +223,11 @@ Here is its schema:
         "registry_agreement_types": ["string"], // Array of agreement types: "base" | "brand" | "community" | "sponsored" | "non_sponsored"
         "icann_translation_en": "string",  // ICANN's raw English Translation of an IDN label, source-faithful [OPTIONAL - IDN gTLDs only]
 
+        // IDN language metadata (derived from tld_script via Unicode CLDR likelySubtags,
+        // with per-(script, region) and per-TLD overrides where the default is wrong)
+        "language_code": "string",         // BCP-47 code (e.g. "ar", "hi", "zh-Hant-TW") [OPTIONAL - IDN only]
+        "language_name_en": "string",      // English name (e.g. "Arabic", "Hindi", "Chinese (Taiwan)") [OPTIONAL - IDN only]
+
         // AS Org infrastructure operators (resolved against organizations.json)
         "as_org_aliases": ["string"],      // Canonical DNS provider display_names hosting nameservers (e.g. ["Identity Digital", "VeriSign"])
         "as_org_slugs": ["string"],        // FKs into organizations.json, parallel to as_org_aliases
@@ -246,6 +251,18 @@ Every TLD is identified by its **A-label** — the ASCII form, including `xn--`
 
 The **U-label** — the rendered Unicode form (e.g. `москва`) — is display-only and appears solely in the `tld_unicode` field, alongside the A-label, never as a key or reference. Consumers that render a name resolve the A-label to `tld_unicode`; they never key on it.
 
+## The typed graph
+
+Alongside `tlds.json`, the build ships four derived reverse-index artifacts that model the root zone as a typed graph of four entity types plus one enum:
+
+- **Domains** — the TLDs themselves (`tlds.json`).
+- **Organizations** — registries, governance bodies, and infrastructure operators (`organizations.json`).
+- **Places** — countries, dependent territories, subdivisions, cities, and supranational regions (`places.json`).
+- **Cultures** — ethno-linguistic communities like the Basques or Welsh (`cultures.json`).
+- **Agreement types** — the ICANN registry-agreement enum (`agreements.json`).
+
+Each TLD relates to one or more Organizations through *roles* (Sponsor, Administrative Contact, Technical Contact, and — for gTLDs — ICANN Registry Operator), to zero or more Places (most ccTLDs map to one country; geographic gTLDs map to a city, subdivision, country, or supranational region), to an optional Culture, and to its agreement types. Each derived artifact is a deterministic reverse index of `tlds.json`: delete it and `make build` rebuilds it. Every cross-file relationship is enforced by referential-integrity tests, so a foreign key can never dangle and no record is ever orphaned.
+
 ## `organizations.json`
 
 The `data/generated/organizations.json` file is the canonical record of the organizations that play roles for TLDs, with a reverse-index of those roles. It is built from a hand-curated identity seed (`data/manual/organizations.json`) joined against `tlds.json`, and replaces the old per-role alias files.
@@ -254,6 +271,24 @@ Each org carries an editorial `display_name` and a stable kebab-case `slug` (the
 
 > **Consolidated subset:** this currently covers the curated multi-source organizations only. The single-source long tail (orgs that appear under one exact name in one source) is not yet included, so the absence of a TLD's operator here does not mean it has none.
 
+## `places.json`
+
+The `data/generated/places.json` file is the canonical record of the places associated with TLDs, with a reverse-index of their TLDs. Countries are derived mechanically from ccTLDs (ISO 3166-1 via `pycountry`); subdivisions, cities, and supranational regions come from a hand-curated seed (`data/manual/places.json`).
+
+Each place carries a stable `slug` (ISO 3166-1 alpha-2 for countries, e.g. `gb`; a recognizable short name for subdivisions, e.g. `basque-country`; the TLD for cities, e.g. `amsterdam`), an English `name_en`, a `subtype` (`country` / `subdivision` / `city` / `supranational`), the `iso_code` where one exists, a `parent` slug for hierarchy (subdivision/city → country; dependent territory → sovereign), an optional `info_link`, and the `tlds` reverse index. A sparse `iso_designation` field carries ISO 3166-1 status for the special cases: `dependent_territory` (e.g. `bm` → `gb`), `exceptionally_reserved` (`ac`), `transitionally_reserved` (`su`), and `special_area` (`aq`). `places[]` is sorted by `slug`.
+
+The United Kingdom is one place slugged `gb` (its ISO alpha-2), carrying both `.gb` and `.uk`; IDN ccTLDs fold into their country (e.g. `xn--p1ai` joins `ru`). Slugs and `tlds` are A-labels/ASCII; Unicode rendering is left to consumers.
+
+## `cultures.json`
+
+The `data/generated/cultures.json` file records the ethno-linguistic communities that at least one TLD claims affiliation with, with a reverse-index of their TLDs. It is built from a hand-curated seed (`data/manual/cultures.json`) joined against each TLD's `cultural_affiliation` annotation.
+
+Each culture carries a stable `slug` (the foreign key `cultural_affiliation` points at), an English `name_en`, an `info_link` to Wikipedia, an optional BCP-47 `language_code` (`null` for multi-lingual cultures like `swiss` / `desi` / `kiwi` / `scottish`), and the `tlds` reverse index. `cultures[]` is sorted by `slug`. The schema is intentionally minimal: descriptions and cross-artifact links belong on the canonical source (Wikipedia via `info_link`), not duplicated here.
+
+## `agreements.json`
+
+The `data/generated/agreements.json` file is the ICANN registry-agreement-type enum with a reverse-index of the gTLDs under each. Each record carries a canonical `slug` (`base` / `non_sponsored` / `brand` / `community` / `sponsored`), a friendly `display_name`, the verbatim ICANN string under `source_names.icann`, and the `tlds` reverse index. `agreements[]` is sorted by `slug`.
+
 ## Local usage
 
 - `make deps` - Install the project dependencies

diff --git a/bin/lint b/bin/lint
@@ -7,10 +7,13 @@ if [ ${#paths[@]} -eq 0 ]; then
   paths=(src/ tests/)
 fi
 
-# Run all three linters even if an earlier one fails, so the developer
+# Run all linters even if an earlier one fails, so the developer
 # sees the full set of findings in one pass instead of round-tripping.
 exit_code=0
 uv run ruff check "${paths[@]}" || exit_code=$?
 uv run ruff format --check "${paths[@]}" || exit_code=$?
 uv run pyright "${paths[@]}" || exit_code=$?
+# JSON parse check runs over the whole repo (independent of the path args) so a
+# stray syntax error or committed merge-conflict marker fails the lint pass.
+python3 bin/lint-json.py || exit_code=$?
 exit $exit_code
diff --git a/bin/lint-json.py b/bin/lint-json.py
@@ -0,0 +1,47 @@
+#!/usr/bin/env python3
+"""Validate that every JSON file in the repo parses cleanly."""
+
+import json
+import sys
+from pathlib import Path
+
+EXCLUDED_DIRS = {".git", ".venv", "node_modules", "__pycache__"}
+
+# Test fixtures that are intentionally invalid JSON.
+EXCLUDED_FILES = {
+    Path("tests/fixtures/metadata/corrupted-metadata.json"),
+}
+
+
+def find_json_files(root: Path):
+    for path in root.rglob("*.json"):
+        if any(part in EXCLUDED_DIRS for part in path.parts):
+            continue
+        if path.relative_to(root) in EXCLUDED_FILES:
+            continue
+        yield path
+
+
+def main() -> int:
+    root = Path.cwd()
+    bad: list[tuple[Path, str]] = []
+    count = 0
+    for path in find_json_files(root):
+        count += 1
+        try:
+            json.loads(path.read_text(encoding="utf-8"))
+        except (json.JSONDecodeError, OSError) as e:
+            bad.append((path.relative_to(root), str(e)))
+
+    if bad:
+        for rel, err in bad:
+            print(f"{rel}: {err}", file=sys.stderr)
+        print(f"\n{len(bad)} of {count} JSON file(s) failed to parse.", file=sys.stderr)
+        return 1
+
+    print(f"{count} JSON file(s) parse cleanly.")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())