Skip to content

docs(positioning): ADR-001 — Reposition as External Facts Context Layer for AI Agents (v3 scope / 22 CHANGE + 1 KEEP)#2

Closed
firstdata-dev wants to merge 399 commits intomainfrom
feat/positioning-adr-001
Closed

docs(positioning): ADR-001 — Reposition as External Facts Context Layer for AI Agents (v3 scope / 22 CHANGE + 1 KEEP)#2
firstdata-dev wants to merge 399 commits intomainfrom
feat/positioning-adr-001

Conversation

@firstdata-dev
Copy link
Copy Markdown
Owner

Pull Request: ADR-001 — Reposition FirstData as "The External Facts Context Layer for AI Agents"

🚫 DO NOT MERGE with gh pr merge --admin

Order-44 applies to this PR. Admin-merge bypass is forbidden. Third violation triggers a permanent [NEEDS-APPROVAL-NOT-ADMIN] prefix on all future PRs from this author.

Merge path: reviewer-approved gh pr merge --squash (or GitHub UI) only after two approvals from @明察 and @明鉴, and after CHANGE == 0 gate (once PR-2 script lands).

If anything feels urgent, escalate via Order-44 ladder, not --admin.


TL;DR

This Draft PR proposes and implements ADR-001, repositioning FirstData's public category from:

"数据源知识库 / Open Data Source Repository / knowledge base"

to:

"The External Facts Context Layer for AI Agents"

The change covers 22 copy edits across 8 files (zero code / zero schema / zero MCP server-name change). Scope lock v3 (authoritative, frozen 2026-05-07 02:23 GMT+8).


Why now

See docs/adr/ADR-001-positioning-context-layer.md §1 for the full context. Short version:

  1. DataHub (11.8K⭐) publicly declared "data catalog" category dead on 2026-04-30 in Context Platform vs. Data Catalog; rebranded as "Context Platform" + "Agent Context Kit".
  2. OpenMetadata (13.8K⭐) overtook DataHub on GitHub stars via MCP-embedded v1.8.0 + "first enterprise-grade" narrative.
  3. Standalone MCP-only repos failed: acryldata/mcp-server-datahub 72⭐, metadata-ai-sdk 8⭐, okfn/mcp-ckan 0⭐ (165–1728× below parent repo). Category fight = parent-repo narrative, not accessory MCP repo.
  4. CKAN window open 6–12 months: okfn/mcp-ckan still in "early research phase" (created 2026-02-03).

Full research note: memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md.

What this PR changes

Commit 1 — docs(positioning): ADR-001 (f22aa09)

Adds the decision records:

  • docs/adr/ADR-001-positioning-context-layer.md (201 lines, sha256 2a04c51bc7054359ac41652f56d0e441bcefbb663a168788081e1c3281bc467a)
  • docs/adr/README.md (ADR index, 29 lines, sha256 d957497300af20157fb81a4822393333f9f98036585da6441edeeba668ed5523)
  • docs/positioning-rollout-tracker.md (82 lines, sha256 7e9c022332ed87c15fbb9aa2f4c9dc71682efd65725d0bad1961dec7514160ab)

Commit 2 — docs(positioning): execute 22 CHANGE copy edits (3ef2375)

22 copy edits across 8 files:

File CHANGE KEEP
README.md 7 0
README.en.md 4 0
README.ja.md 5 1 (L592, contribution-flow wording)
pyproject.toml 1 0
AGENTS.md 1 0
CLAUDE.md 1 0
skills/firstdata/SKILL.md 2 0
firstdata/sources/china/README.md 1 0
Total 22 1

Full per-file / per-line breakdown: docs/positioning-rollout-tracker.md.

What this PR does NOT change

  • sources/**/*.json (frozen by contract)
  • firstdata/indexes/*.json (build artefacts)
  • MCP server name firstdata (frozen; future rename requires ADR-002 + 2-week ChangeLog + email notice)
  • HTTP endpoint https://firstdata.deepminer.com.cn/mcp
  • GitHub repo name MLT-OSS/FirstData
  • ClawHub slug firstdata

Scope lock chain (audit trail)

The three parties (@墨子 / @明察 / @明鉴) locked scope v3 at 2026-05-07 02:23 GMT+8 and unanimously withdrew all subsequent override attempts:

Version Numbers Proposer Status
v3 23 / 22 / 1 @明察 (SOP-7 adjudication) AUTHORITATIVE
v4 24 / 24 / 0 @墨子 (symmetry flip) withdrawn 03:05
v7 22 / 22 / 1 naming reuse, same numbers retired
v8 26 / 26 / 0 @明察 (regex upgrade) withdrawn 03:15
v9 25 / 23 / 2 @明鉴 (v7 wide exec) withdrawn 03:24
v10 26 / 23 / 3 @墨子 (compromise) withdrawn 03:26

All withdrawals documented with message IDs in memory/reflections/2026-05-07-enumeration-discipline.md.

Verification

# narrow regex (v1.1), expected 1 hit = ja:592 KEEP whitelist
$ grep -rn --include="*.md" --include="*.toml" \
    --exclude-dir=.venv --exclude-dir=node_modules \
    --exclude-dir=memory --exclude-dir=docs \
    -E '知识库|ナレッジベース|知識ベース|オープンデータソースリポジトリ|データソースリポジトリ' . \
  | grep -vE '^\./(memory|docs/research|docs/adr|CHANGELOG)'
./README.ja.md:592:3. 評価が通れば、公式にデータソースリポジトリに収録されます   ← KEEP (ADR §2)

Post-edit diff: 8 files, 22 insertions(+), 22 deletions(-) — byte-level match with v3 per-file breakdown.

Merge gate (all four MUST be green)

  • Byte-level diff matches docs/positioning-rollout-tracker.md per-file table
  • scripts/check-positioning-consistency.sh returns CHANGE == 0 on HEAD (applies once PR-2 lands; scriptless interim: reviewer runs narrow regex above and sees only ja:592)
  • Approval from @明察 (AI-0000002)
  • Approval from @明鉴 (AI-0000003)

Rollback

  • Owner: @ningzimu (no other party may initiate)
  • Procedure: git revert <merge-commit> of this PR, then git revert of the ADR commit
  • Cost: ≤ 30 min mechanical + 0.25 person-day comms

Follow-ups (NOT blockers)

  • PR-2 feat/positioning-tooling — commits scripts/check-positioning-consistency.sh + .pre-commit-config.yaml
  • PR-3 feat/positioning-ci — commits .github/workflows/positioning-check.yml
  • Blog matrix (ADR §P2) — FirstData vs DataHub, Context Layer vs Context Platform, Why CKAN Portals Need FirstData
  • CKAN plugin (ADR P1) — firstdata-ckan-plugin prototype within the 6–12 month window

Reviewers

  • @明察 (AI-0000002) — authoritative scope + regex
  • @明鉴 (AI-0000003) — methodology audit + anti-pattern sinking
  • @ningzimu — rollback owner + category word arbiter

Checklist


🚫 Reminder: Do NOT merge with --admin. Order-44 applies.

…an-meti, us-federalreserve)

feat: add 5 new data sources
feat: add Greece ELSTAT and Colombia DANE data sources
feat: add Hungary KSH data source
- Romania National Institute of Statistics (INS / INSSE)
  id: romania-ins
  URL: https://insse.ro
  Covers: GDP, population census, CPI, employment, trade, agriculture

- Statistical Office of the Republic of Slovenia (SURS)
  id: slovenia-surs
  URL: https://www.stat.si
  Covers: GDP, population census, CPI, employment, trade, environment

Both are EU member national statistics offices compliant with Eurostat standards.
Total sources: 303
…of Slovak Republic (SUSR)

- croatia-dzs: Official statistical office of Croatia (Državni zavod za statistiku)
  covering demographics, economics, employment, trade, agriculture, environment
  EU/EUROSTAT harmonized statistics, NUTS regional data
  Website: https://dzs.gov.hr

- slovakia-susr: Official statistical office of Slovakia (Štatistický úrad SR)
  covering demographics, economics, employment, trade, agriculture, environment
  EU/EUROSTAT harmonized statistics, DataCube access
  Website: https://statistics.sk
feat: add Croatian Bureau of Statistics (DZS) and Statistical Office of Slovak Republic (SUSR)
Add government statistical sources for countries not previously covered:
- japan-mof: Ministry of Finance Japan (trade & fiscal data)
- romania-insse: National Institute of Statistics of Romania
- romania-bnr: National Bank of Romania (monetary & financial data)
- cambodia-nis: National Institute of Statistics of Cambodia
- laos-lsb: Lao Statistics Bureau
- brunei-deps: Department of Economic Planning and Statistics, Brunei

Total sources: 301 → 307
feat: add Romania INS and Slovenia SURS data sources
…ields, fix cambodia-nis data_url (cameroon→camstat), fix domains (statistics→economics)
…nei DEPS, Romania BNR (MLT-OSS#95)

feat: add 6 new data sources
fix: brunei-deps data_url to edata-library
- Peru INEI (Instituto Nacional de Estadística e Informática): Official
  statistical agency of Peru covering GDP, population census, employment,
  CPI, household surveys, trade, and social statistics.

- Bulgaria NSI (National Statistical Institute): Official Bulgarian
  statistics authority producing EU/Eurostat-harmonized data on
  demographics, economics, employment, prices, trade, and regional stats.

Closes MLT-OSS#95
firstdata-dev and others added 28 commits April 30, 2026 11:11
- china-ports-association: China Ports & Harbours Association (transport/logistics)
- china-cttic: China Transport Telecommunications & Information Center
- romania-bvb: Bucharest Stock Exchange (finance/securities)
- asean-centre-for-energy: ASEAN Centre for Energy (regional energy)
- asx: Australian Securities Exchange (finance/securities)
* feat: add 4 new data sources

- china-cdc: Chinese Center for Disease Control and Prevention
- china-cnpc: China National Petroleum Corporation
- china-sinopec: China Petrochemical Corporation (Sinopec Group)
- china-cnooc: China National Offshore Oil Corporation

* fix: remove Chinese tags and convert spaces to hyphens

Response to review: tags must be lowercase English with hyphens only.
No Chinese characters, no spaces.
Schema rule (PR MLT-OSS#175/MLT-OSS#176/MLT-OSS#178 lineage).

* fix: address review — URL accessibility notes for CDC/CNPC/Sinopec/CNOOC

- china-cdc: data_url → /gzdt/ (stable), note about /jkzt/ reorganization
- china-cnpc: note about WAF returning 412 to automated probes
- china-sinopec: data_url switched to http (https endpoint unstable from some networks)
- china-cnooc: data_url simplified to root landing (col/col6264 server-side redirect loop for non-browser clients)

All 4 files still pass schema validation.

* fix: restore Chinese tags (my earlier removal was over-correction)

Schema explicitly allows 'mixed Chinese/English keywords' for discoverability.
Earlier commit 86f6d35 wrongly stripped Chinese tags based on a misremembered
review rule from PR MLT-OSS#175/MLT-OSS#176/MLT-OSS#178 (which were actually about space→hyphen, not CN removal).

Chinese tags restored to match original feat commit, with space→hyphen applied
only to English multi-word tags. No lowercase changes.
Adds hard schema-level enforcement that tags must not contain whitespace.
Description updated with explicit rule + case convention (方案 A + CJK exception).

Rule:
- MUST: no whitespace (^\\S+$)
- SHOULD: new pure-ASCII tags lowercase (gdp/ipo)
- MUST: mixed CJK+ASCII acronyms (AI产业/3C认证/A股) preserve ASCII case

Direct commit per protect-schema workflow rule (PRs forbidden for schema).

Ref: 2026-04-30 three-way alignment (明鉴 final 11:12, 老板 approval 11:12)
- china-nncc: 中国国家禁毒委员会 (China National Narcotics Control Commission)
- china-catcm: 中国中药协会 (China Association of Traditional Chinese Medicine)
- china-cfpa: 中国消防协会 (China Fire Protection Association)
- china-cflac: 中国文学艺术界联合会 (China Federation of Literary and Art Circles)
- china-csei: 中国特种设备检测研究院 (China Special Equipment Inspection and Research Institute)
- china-cscec: China State Construction Engineering Corporation (世界最大建筑集团央企)
- china-crrc: CRRC Corporation Limited (全球最大轨道交通装备制造商)
- china-huaneng: China Huaneng Group (五大发电央企之一)
- china-cagis: China Association for Geographic Information Industry (地信产业协会)
- china-cnaf: China National Arts Fund (国家级艺术公益基金)

Daily contribution by AI-0000001 (FirstData 墨子)
* feat: add 5 China real estate data sources

- china-cih-index: China Index Academy / CIH Cloud (CREIS, 100-city price index, TOP100 rankings)
- china-beike-research: Beike Research Institute (second-hand housing price index, rental market)
- china-cric: China Real Estate Information Corporation (developer sales rankings, debt monitoring)
- china-creprice: China Real Estate Price Information Network (city and community price data)
- china-fangjia: Fangjia.com housing price network (address standardization, mortgage valuation)

* fix(pr#200): change 4 broken data_urls to root paths (all 404 → 200)

Per 明察 review: data_url 4/5 return 404. Root paths all return 200.

- beike-research: /reports (404) → root (200)
- cih-index: /search (404) → root (200)
- creprice: /rank/ (404) → root (200)
- fangjia: /cities/ (404) → root (200)
…SS#201)

- china-sinograin: 中储粮集团 (Sinograin, national grain reserves corp)
- china-nfra-fire: 国家消防救援局 (National Fire and Rescue Administration)
- china-ches: 中国水利学会 (China Hydraulic Engineering Society)
- china-chinalco: 中国铝业集团有限公司 (Aluminum Corporation of China)
- china-phirda: 中国医药创新促进会 (China Pharma Innovation Association)
- china-film-admin: National Film Administration of China (国家电影局)
  Box office, cinema stats, film production data
- china-bof: Bureau of Fisheries, MARA (农业农村部渔业渔政管理局)
  China Fishery Statistical Yearbook, aquaculture, capture fisheries
- china-avic: Aviation Industry Corporation of China (中国航空工业集团)
  Aviation manufacturing, aerospace, defense industry data
- china-capco: China Association for Public Companies (中国上市公司协会)
  Listed company performance, governance, ESG disclosures
- china-cofco: COFCO Corporation (中粮集团)
  Grain trading, food processing, agricultural commodities
…T-OSS#203)

Add CEMIA (中国电子材料行业协会), the national MIIT-supervised industry
association for China's semiconductor materials, electronic specialty
gases, third-generation semiconductor (SiC/GaN), and photovoltaic
materials sectors. Its sub-committees publish key statistics, industry
reports, and standards.

- id: china-cemia
- authority_level: other (industry association under MIIT)
- country: CN
- domains: semiconductor-materials, electronic-materials, photovoltaic-materials, industry
…05-02) (MLT-OSS#204)

- china-casc: China Aerospace Science and Technology Corporation (CASC)
  Premier state-owned aerospace enterprise; Long March rockets, Shenzhou,
  Chang'e lunar program, Tianwen Mars mission, Beidou satellite system
- china-ctg: China Three Gorges Corporation (CTG)
  World's largest hydropower company; Three Gorges Dam, Baihetan, clean
  energy transition data, Yangtze River ecological reports
- china-chnenergy: China Energy Investment Corporation (CHN Energy)
  World's largest coal producer; merged from Shenhua and Guodian;
  coal production, power generation mix, carbon reduction data
- china-cdrf: China Development Research Foundation (CDRF)
  Affiliated with DRC; China Development Report, China Development
  Forum, early childhood development and poverty research
- china-tower: China Tower Corporation
  World's largest telecom tower operator; 2M+ sites, 5G deployment
  tracking, infrastructure sharing data (HKEX: 0788)
- china-iprcc: International Poverty Reduction Center in China (IPRCC)
- china-nhei: China National Health Development Research Center
- china-nies: Nanjing Institute of Environmental Sciences, MEE
- china-sass: Shanghai Academy of Social Sciences
- china-drcnet: DRCNET - Development Research Center Network

All 5 are authoritative Chinese research/government institutions covering
poverty reduction, health policy, environmental science, social sciences,
and macroeconomic policy research. Schema validation passed.
…S#206)

* feat: add 5 China authoritative sources (AM batch 2026-05-04)

- china-nifa: National Internet Finance Association of China (中国互联网金融协会)
  - Internet finance industry data, P2P/fintech statistics, NIFDS compliance data

- china-nifdc: National Institutes for Food and Drug Control (中国食品药品检定研究院)
  - Drug standards, Chinese Pharmacopoeia, biological product batch release data

- china-ctmo: China Trademark Office / CNIPA Trademark Bureau (国家知识产权局商标局)
  - China trademark registration database, trademark statistics

- china-ccs-crop: Chinese Crop Science Society (中国作物学会)
  - National crop variety database, germplasm resources, crop production data

- china-cbea: China Beverage Association (中国饮料工业协会)
  - Beverage industry production statistics, market data

- fix: china-boc.json JSON syntax error (unescaped quotes in Chinese text)

* fix(boc): replace literal '201c/201d' strings with proper Unicode quotes U+201C/U+201D

Previous fix attempt failed: escape codepoint text was emitted as literal '201c'/'201d' instead of actual Unicode characters 中/
- china-ncc: National Climate Center of China (CMA)
- global-carbon-project: Global Carbon Project (GCP)
- cdp: Carbon Disclosure Project
- global-reporting-initiative: GRI Standards
- sasb-standards: SASB Standards (IFRS Foundation)

Focus areas: climate disclosure, ESG reporting standards, greenhouse gas accounting.
…S#208)

- china-cantonfair: 中国进出口商品交易会(广交会)— biannual trade fair data
- china-ciesc: 中国化工学会 — chemical industry academic society (est. 1922)
- china-csg: 中国南方电网 — grid operation data for southern five provinces
- china-chinacoal: 中国中煤能源集团 — coal & coal chemical SOE, HKEX/SSE listed
- china-cafiu: 中国国际交流协会 — party/public diplomacy exchange platform
…ontext Layer

Proposes repositioning from '数据源知识库 / Open Data Source Repository /
knowledge base' to 'The External Facts Context Layer for AI Agents'.

Context:
- DataHub declared 'data catalog' category dead (2026-04-30 blog)
- OpenMetadata overtook DataHub on GitHub stars via MCP narrative
- Standalone MCP-only repos fail to pull weight (165-1728x gap)

Scope lock v3 (authoritative, 2026-05-07 02:23 GMT+8):
  hits   = 23
  CHANGE = 22
  KEEP   = 1 (ja:592, business-process wording)
  files  = 8
  base   = bad4772

This commit contains ONLY the ADR + index + rollout tracker.
The 22 copy edits land in a follow-up PR-1 commit on the same branch.

Deciders: @ningzimu (rollback owner), @墨子 (proposer),
          @明察 + @明鉴 (reviewers)

Refs:
- memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md
- memory/reflections/2026-05-07-enumeration-discipline.md

Anti-patterns sunk during this scope lock:
- MLT-OSS#29 BB: Cross-language-self-title-blindspot
- MLT-OSS#30 CC: Memory-Ground-Truth-Drift

NEVER 'gh pr merge --admin' - Order-44 applies.
…k v3

Implements ADR-001 (commit f22aa09): replace '数据源知识库 / Open Data
Source Repository / knowledge base' with 'External Facts Context Layer
for AI Agents' copy across 8 files.

Scope lock v3 (authoritative):
  hits   = 23
  CHANGE = 22 (this commit)
  KEEP   = 1 (README.ja.md:592, business-process wording)
  base   = bad4772

Post-edit verification:
- narrow regex '知识库|ナレッジベース|知識ベース|オープンデータソースリポジトリ|データソースリポジトリ' returns 1 hit = ja:592 (expected KEEP)
- 8 files, 22 insertions, 22 deletions (byte-level match with v3 per-file breakdown in docs/positioning-rollout-tracker.md)

No changes to:
- sources/**/*.json
- firstdata/indexes/*.json
- MCP server name (firstdata)
- HTTP endpoint

NEVER 'gh pr merge --admin' - Order-44 applies.
@firstdata-dev
Copy link
Copy Markdown
Owner Author

Wrong base. Will re-open against upstream MLT-OSS/FirstData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants