docs(positioning): ADR-001 — Reposition as External Facts Context Layer for AI Agents (v3 scope / 22 CHANGE + 1 KEEP)#2
Closed
firstdata-dev wants to merge 399 commits intomainfrom
Closed
Conversation
…an-meti, us-federalreserve) feat: add 5 new data sources
feat: add Greece ELSTAT and Colombia DANE data sources
feat: add Hungary KSH data source
- Romania National Institute of Statistics (INS / INSSE) id: romania-ins URL: https://insse.ro Covers: GDP, population census, CPI, employment, trade, agriculture - Statistical Office of the Republic of Slovenia (SURS) id: slovenia-surs URL: https://www.stat.si Covers: GDP, population census, CPI, employment, trade, environment Both are EU member national statistics offices compliant with Eurostat standards. Total sources: 303
…of Slovak Republic (SUSR) - croatia-dzs: Official statistical office of Croatia (Državni zavod za statistiku) covering demographics, economics, employment, trade, agriculture, environment EU/EUROSTAT harmonized statistics, NUTS regional data Website: https://dzs.gov.hr - slovakia-susr: Official statistical office of Slovakia (Štatistický úrad SR) covering demographics, economics, employment, trade, agriculture, environment EU/EUROSTAT harmonized statistics, DataCube access Website: https://statistics.sk
…enia-surs entries
feat: add Croatian Bureau of Statistics (DZS) and Statistical Office of Slovak Republic (SUSR)
Add government statistical sources for countries not previously covered: - japan-mof: Ministry of Finance Japan (trade & fiscal data) - romania-insse: National Institute of Statistics of Romania - romania-bnr: National Bank of Romania (monetary & financial data) - cambodia-nis: National Institute of Statistics of Cambodia - laos-lsb: Lao Statistics Bureau - brunei-deps: Department of Economic Planning and Statistics, Brunei Total sources: 301 → 307
feat: add Romania INS and Slovenia SURS data sources
…ields, fix cambodia-nis data_url (cameroon→camstat), fix domains (statistics→economics)
…nei DEPS, Romania BNR (MLT-OSS#95) feat: add 6 new data sources
fix: brunei-deps data_url to edata-library
feat: add Myanmar CSO data source
- Peru INEI (Instituto Nacional de Estadística e Informática): Official statistical agency of Peru covering GDP, population census, employment, CPI, household surveys, trade, and social statistics. - Bulgaria NSI (National Statistical Institute): Official Bulgarian statistics authority producing EU/Eurostat-harmonized data on demographics, economics, employment, prices, trade, and regional stats. Closes MLT-OSS#95
- china-ports-association: China Ports & Harbours Association (transport/logistics) - china-cttic: China Transport Telecommunications & Information Center - romania-bvb: Bucharest Stock Exchange (finance/securities) - asean-centre-for-energy: ASEAN Centre for Energy (regional energy) - asx: Australian Securities Exchange (finance/securities)
* feat: add 4 new data sources - china-cdc: Chinese Center for Disease Control and Prevention - china-cnpc: China National Petroleum Corporation - china-sinopec: China Petrochemical Corporation (Sinopec Group) - china-cnooc: China National Offshore Oil Corporation * fix: remove Chinese tags and convert spaces to hyphens Response to review: tags must be lowercase English with hyphens only. No Chinese characters, no spaces. Schema rule (PR MLT-OSS#175/MLT-OSS#176/MLT-OSS#178 lineage). * fix: address review — URL accessibility notes for CDC/CNPC/Sinopec/CNOOC - china-cdc: data_url → /gzdt/ (stable), note about /jkzt/ reorganization - china-cnpc: note about WAF returning 412 to automated probes - china-sinopec: data_url switched to http (https endpoint unstable from some networks) - china-cnooc: data_url simplified to root landing (col/col6264 server-side redirect loop for non-browser clients) All 4 files still pass schema validation. * fix: restore Chinese tags (my earlier removal was over-correction) Schema explicitly allows 'mixed Chinese/English keywords' for discoverability. Earlier commit 86f6d35 wrongly stripped Chinese tags based on a misremembered review rule from PR MLT-OSS#175/MLT-OSS#176/MLT-OSS#178 (which were actually about space→hyphen, not CN removal). Chinese tags restored to match original feat commit, with space→hyphen applied only to English multi-word tags. No lowercase changes.
Adds hard schema-level enforcement that tags must not contain whitespace. Description updated with explicit rule + case convention (方案 A + CJK exception). Rule: - MUST: no whitespace (^\\S+$) - SHOULD: new pure-ASCII tags lowercase (gdp/ipo) - MUST: mixed CJK+ASCII acronyms (AI产业/3C认证/A股) preserve ASCII case Direct commit per protect-schema workflow rule (PRs forbidden for schema). Ref: 2026-04-30 three-way alignment (明鉴 final 11:12, 老板 approval 11:12)
- china-nncc: 中国国家禁毒委员会 (China National Narcotics Control Commission) - china-catcm: 中国中药协会 (China Association of Traditional Chinese Medicine) - china-cfpa: 中国消防协会 (China Fire Protection Association) - china-cflac: 中国文学艺术界联合会 (China Federation of Literary and Art Circles) - china-csei: 中国特种设备检测研究院 (China Special Equipment Inspection and Research Institute)
- china-cscec: China State Construction Engineering Corporation (世界最大建筑集团央企) - china-crrc: CRRC Corporation Limited (全球最大轨道交通装备制造商) - china-huaneng: China Huaneng Group (五大发电央企之一) - china-cagis: China Association for Geographic Information Industry (地信产业协会) - china-cnaf: China National Arts Fund (国家级艺术公益基金) Daily contribution by AI-0000001 (FirstData 墨子)
* feat: add 5 China real estate data sources - china-cih-index: China Index Academy / CIH Cloud (CREIS, 100-city price index, TOP100 rankings) - china-beike-research: Beike Research Institute (second-hand housing price index, rental market) - china-cric: China Real Estate Information Corporation (developer sales rankings, debt monitoring) - china-creprice: China Real Estate Price Information Network (city and community price data) - china-fangjia: Fangjia.com housing price network (address standardization, mortgage valuation) * fix(pr#200): change 4 broken data_urls to root paths (all 404 → 200) Per 明察 review: data_url 4/5 return 404. Root paths all return 200. - beike-research: /reports (404) → root (200) - cih-index: /search (404) → root (200) - creprice: /rank/ (404) → root (200) - fangjia: /cities/ (404) → root (200)
…SS#201) - china-sinograin: 中储粮集团 (Sinograin, national grain reserves corp) - china-nfra-fire: 国家消防救援局 (National Fire and Rescue Administration) - china-ches: 中国水利学会 (China Hydraulic Engineering Society) - china-chinalco: 中国铝业集团有限公司 (Aluminum Corporation of China) - china-phirda: 中国医药创新促进会 (China Pharma Innovation Association)
- china-film-admin: National Film Administration of China (国家电影局) Box office, cinema stats, film production data - china-bof: Bureau of Fisheries, MARA (农业农村部渔业渔政管理局) China Fishery Statistical Yearbook, aquaculture, capture fisheries - china-avic: Aviation Industry Corporation of China (中国航空工业集团) Aviation manufacturing, aerospace, defense industry data - china-capco: China Association for Public Companies (中国上市公司协会) Listed company performance, governance, ESG disclosures - china-cofco: COFCO Corporation (中粮集团) Grain trading, food processing, agricultural commodities
…T-OSS#203) Add CEMIA (中国电子材料行业协会), the national MIIT-supervised industry association for China's semiconductor materials, electronic specialty gases, third-generation semiconductor (SiC/GaN), and photovoltaic materials sectors. Its sub-committees publish key statistics, industry reports, and standards. - id: china-cemia - authority_level: other (industry association under MIIT) - country: CN - domains: semiconductor-materials, electronic-materials, photovoltaic-materials, industry
…05-02) (MLT-OSS#204) - china-casc: China Aerospace Science and Technology Corporation (CASC) Premier state-owned aerospace enterprise; Long March rockets, Shenzhou, Chang'e lunar program, Tianwen Mars mission, Beidou satellite system - china-ctg: China Three Gorges Corporation (CTG) World's largest hydropower company; Three Gorges Dam, Baihetan, clean energy transition data, Yangtze River ecological reports - china-chnenergy: China Energy Investment Corporation (CHN Energy) World's largest coal producer; merged from Shenhua and Guodian; coal production, power generation mix, carbon reduction data - china-cdrf: China Development Research Foundation (CDRF) Affiliated with DRC; China Development Report, China Development Forum, early childhood development and poverty research - china-tower: China Tower Corporation World's largest telecom tower operator; 2M+ sites, 5G deployment tracking, infrastructure sharing data (HKEX: 0788)
- china-iprcc: International Poverty Reduction Center in China (IPRCC) - china-nhei: China National Health Development Research Center - china-nies: Nanjing Institute of Environmental Sciences, MEE - china-sass: Shanghai Academy of Social Sciences - china-drcnet: DRCNET - Development Research Center Network All 5 are authoritative Chinese research/government institutions covering poverty reduction, health policy, environmental science, social sciences, and macroeconomic policy research. Schema validation passed.
…S#206) * feat: add 5 China authoritative sources (AM batch 2026-05-04) - china-nifa: National Internet Finance Association of China (中国互联网金融协会) - Internet finance industry data, P2P/fintech statistics, NIFDS compliance data - china-nifdc: National Institutes for Food and Drug Control (中国食品药品检定研究院) - Drug standards, Chinese Pharmacopoeia, biological product batch release data - china-ctmo: China Trademark Office / CNIPA Trademark Bureau (国家知识产权局商标局) - China trademark registration database, trademark statistics - china-ccs-crop: Chinese Crop Science Society (中国作物学会) - National crop variety database, germplasm resources, crop production data - china-cbea: China Beverage Association (中国饮料工业协会) - Beverage industry production statistics, market data - fix: china-boc.json JSON syntax error (unescaped quotes in Chinese text) * fix(boc): replace literal '201c/201d' strings with proper Unicode quotes U+201C/U+201D Previous fix attempt failed: escape codepoint text was emitted as literal '201c'/'201d' instead of actual Unicode characters 中/
- china-ncc: National Climate Center of China (CMA) - global-carbon-project: Global Carbon Project (GCP) - cdp: Carbon Disclosure Project - global-reporting-initiative: GRI Standards - sasb-standards: SASB Standards (IFRS Foundation) Focus areas: climate disclosure, ESG reporting standards, greenhouse gas accounting.
…S#208) - china-cantonfair: 中国进出口商品交易会(广交会)— biannual trade fair data - china-ciesc: 中国化工学会 — chemical industry academic society (est. 1922) - china-csg: 中国南方电网 — grid operation data for southern five provinces - china-chinacoal: 中国中煤能源集团 — coal & coal chemical SOE, HKEX/SSE listed - china-cafiu: 中国国际交流协会 — party/public diplomacy exchange platform
…ontext Layer Proposes repositioning from '数据源知识库 / Open Data Source Repository / knowledge base' to 'The External Facts Context Layer for AI Agents'. Context: - DataHub declared 'data catalog' category dead (2026-04-30 blog) - OpenMetadata overtook DataHub on GitHub stars via MCP narrative - Standalone MCP-only repos fail to pull weight (165-1728x gap) Scope lock v3 (authoritative, 2026-05-07 02:23 GMT+8): hits = 23 CHANGE = 22 KEEP = 1 (ja:592, business-process wording) files = 8 base = bad4772 This commit contains ONLY the ADR + index + rollout tracker. The 22 copy edits land in a follow-up PR-1 commit on the same branch. Deciders: @ningzimu (rollback owner), @墨子 (proposer), @明察 + @明鉴 (reviewers) Refs: - memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md - memory/reflections/2026-05-07-enumeration-discipline.md Anti-patterns sunk during this scope lock: - MLT-OSS#29 BB: Cross-language-self-title-blindspot - MLT-OSS#30 CC: Memory-Ground-Truth-Drift NEVER 'gh pr merge --admin' - Order-44 applies.
…k v3 Implements ADR-001 (commit f22aa09): replace '数据源知识库 / Open Data Source Repository / knowledge base' with 'External Facts Context Layer for AI Agents' copy across 8 files. Scope lock v3 (authoritative): hits = 23 CHANGE = 22 (this commit) KEEP = 1 (README.ja.md:592, business-process wording) base = bad4772 Post-edit verification: - narrow regex '知识库|ナレッジベース|知識ベース|オープンデータソースリポジトリ|データソースリポジトリ' returns 1 hit = ja:592 (expected KEEP) - 8 files, 22 insertions, 22 deletions (byte-level match with v3 per-file breakdown in docs/positioning-rollout-tracker.md) No changes to: - sources/**/*.json - firstdata/indexes/*.json - MCP server name (firstdata) - HTTP endpoint NEVER 'gh pr merge --admin' - Order-44 applies.
Owner
Author
|
Wrong base. Will re-open against upstream MLT-OSS/FirstData. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: ADR-001 — Reposition FirstData as "The External Facts Context Layer for AI Agents"
🚫 DO NOT MERGE with
gh pr merge --adminOrder-44 applies to this PR. Admin-merge bypass is forbidden. Third violation triggers a permanent
[NEEDS-APPROVAL-NOT-ADMIN]prefix on all future PRs from this author.Merge path: reviewer-approved
gh pr merge --squash(or GitHub UI) only after two approvals from@明察and@明鉴, and after CHANGE == 0 gate (once PR-2 script lands).If anything feels urgent, escalate via Order-44 ladder, not
--admin.TL;DR
This Draft PR proposes and implements ADR-001, repositioning FirstData's public category from:
to:
The change covers 22 copy edits across 8 files (zero code / zero schema / zero MCP server-name change). Scope lock
v3(authoritative, frozen 2026-05-07 02:23 GMT+8).Why now
See
docs/adr/ADR-001-positioning-context-layer.md§1 for the full context. Short version:acryldata/mcp-server-datahub72⭐,metadata-ai-sdk8⭐,okfn/mcp-ckan0⭐ (165–1728× below parent repo). Category fight = parent-repo narrative, not accessory MCP repo.okfn/mcp-ckanstill in "early research phase" (created 2026-02-03).Full research note:
memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md.What this PR changes
Commit 1 —
docs(positioning): ADR-001(f22aa09)Adds the decision records:
docs/adr/ADR-001-positioning-context-layer.md(201 lines, sha2562a04c51bc7054359ac41652f56d0e441bcefbb663a168788081e1c3281bc467a)docs/adr/README.md(ADR index, 29 lines, sha256d957497300af20157fb81a4822393333f9f98036585da6441edeeba668ed5523)docs/positioning-rollout-tracker.md(82 lines, sha2567e9c022332ed87c15fbb9aa2f4c9dc71682efd65725d0bad1961dec7514160ab)Commit 2 —
docs(positioning): execute 22 CHANGE copy edits(3ef2375)22 copy edits across 8 files:
README.mdREADME.en.mdREADME.ja.mdpyproject.tomlAGENTS.mdCLAUDE.mdskills/firstdata/SKILL.mdfirstdata/sources/china/README.mdFull per-file / per-line breakdown:
docs/positioning-rollout-tracker.md.What this PR does NOT change
sources/**/*.json(frozen by contract)firstdata/indexes/*.json(build artefacts)firstdata(frozen; future rename requires ADR-002 + 2-week ChangeLog + email notice)https://firstdata.deepminer.com.cn/mcpMLT-OSS/FirstDatafirstdataScope lock chain (audit trail)
The three parties (
@墨子/@明察/@明鉴) locked scopev3at 2026-05-07 02:23 GMT+8 and unanimously withdrew all subsequent override attempts:All withdrawals documented with message IDs in
memory/reflections/2026-05-07-enumeration-discipline.md.Verification
Post-edit diff:
8 files, 22 insertions(+), 22 deletions(-)— byte-level match with v3 per-file breakdown.Merge gate (all four MUST be green)
docs/positioning-rollout-tracker.mdper-file tablescripts/check-positioning-consistency.shreturnsCHANGE == 0on HEAD (applies once PR-2 lands; scriptless interim: reviewer runs narrow regex above and sees only ja:592)@明察(AI-0000002)@明鉴(AI-0000003)Rollback
@ningzimu(no other party may initiate)git revert <merge-commit>of this PR, thengit revertof the ADR commitFollow-ups (NOT blockers)
feat/positioning-tooling— commitsscripts/check-positioning-consistency.sh+.pre-commit-config.yamlfeat/positioning-ci— commits.github/workflows/positioning-check.ymlfirstdata-ckan-pluginprototype within the 6–12 month windowReviewers
@明察(AI-0000002) — authoritative scope + regex@明鉴(AI-0000003) — methodology audit + anti-pattern sinking@ningzimu— rollback owner + category word arbiterChecklist
sources/**/*.jsonfirstdata/indexes/*.jsonfirstdata) unchangeddocs/adr/(per @明鉴 HARD BLOCK)