Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
399 commits
Select commit Hold shift + click to select a range
0323da9
feat: add 5 new data sources (india-rbi, unesco-uis, japan-jetro, jap…
firstdata-dev Mar 25, 2026
4fef85f
chore(indexes): auto-update indexes
firstdata-dev Mar 25, 2026
06eb61d
feat: add Greece ELSTAT and Colombia DANE data sources
firstdata-dev Mar 25, 2026
52e7934
chore(indexes): auto-update indexes
firstdata-dev Mar 25, 2026
b8e9305
feat: add Hungary KSH data source
firstdata-dev Mar 26, 2026
87fc1d6
feat: add Hungary KSH data source
firstdata-dev Mar 26, 2026
7dc8c57
chore(indexes): auto-update indexes
firstdata-dev Mar 26, 2026
f1fe9d5
feat: add Romania INS and Slovenia SURS data sources
firstdata-dev Mar 26, 2026
c7e03b0
fix: remove native from name object (schema only allows en/zh)
firstdata-dev Mar 26, 2026
e99c9b2
feat: add Croatian Bureau of Statistics (DZS) and Statistical Office …
firstdata-dev Mar 27, 2026
d489932
fix: rebuild indexes to remove native field from romania-ins and slov…
firstdata-dev Mar 27, 2026
f0a6918
feat: add Croatia DZS and Slovakia SUSR data sources (#94)
firstdata-dev Mar 27, 2026
b4c9f86
chore(indexes): auto-update indexes
firstdata-dev Mar 27, 2026
2f648c6
feat: add 6 new data sources
firstdata-dev Mar 27, 2026
8bee56c
merge: resolve index conflicts with main after PR #94
firstdata-dev Mar 27, 2026
352387d
feat: add Romania INS and Slovenia SURS data sources (#92)
firstdata-dev Mar 27, 2026
4fe6f25
chore(indexes): auto-update indexes
firstdata-dev Mar 27, 2026
b92d857
fix: remove romania-insse (duplicate of romania-ins), remove native f…
firstdata-dev Mar 27, 2026
6ef4974
merge: resolve index conflicts with main after PR #92
firstdata-dev Mar 27, 2026
b247143
feat: add 5 new data sources - Japan MOF, Cambodia NIS, Laos LSB, Bru…
firstdata-dev Mar 27, 2026
697ff65
chore(indexes): auto-update indexes
firstdata-dev Mar 27, 2026
0a13b98
fix: brunei-deps data_url 404 → use main website
firstdata-dev Mar 27, 2026
ce0d074
chore(indexes): auto-update indexes
firstdata-dev Mar 27, 2026
cc95895
fix: brunei-deps data_url to edata-library (proper data portal)
firstdata-dev Mar 27, 2026
16a3f3c
fix: brunei-deps data_url to edata-library (#96)
firstdata-dev Mar 27, 2026
6fd2f04
chore(indexes): auto-update indexes
firstdata-dev Mar 27, 2026
b085870
feat: add Myanmar CSO data source (ASEAN 10/10)
firstdata-dev Mar 27, 2026
c259d25
feat: add Myanmar CSO data source - ASEAN 10/10 (#99)
firstdata-dev Mar 27, 2026
9a10fa9
chore(indexes): auto-update indexes
firstdata-dev Mar 27, 2026
a9e8b60
feat: add Peru INEI and Bulgaria NSI data sources
firstdata-dev Mar 28, 2026
627cbe0
feat: add Peru INEI and Bulgaria NSI data sources (#100)
firstdata-dev Mar 28, 2026
0e25b81
chore(indexes): auto-update indexes
firstdata-dev Mar 28, 2026
81194b5
fix: upgrade HTTP to HTTPS for 4 government data sources (#101)
mingcha-dev Mar 28, 2026
d1f3400
chore(indexes): auto-update indexes
firstdata-dev Mar 28, 2026
0b1abf9
fix: replace 62 underscore domains with hyphens in 22 source files
firstdata-dev Mar 29, 2026
008a3d4
fix: replace 62 underscore domains with hyphens in 22 files (#104)
firstdata-dev Mar 29, 2026
ca79c77
chore(indexes): auto-update indexes
firstdata-dev Mar 29, 2026
709ab4f
fix: upgrade 6 HTTP URLs to HTTPS in 4 source files
firstdata-dev Mar 29, 2026
9c19ef2
fix: upgrade 6 HTTP URLs to HTTPS in 4 source files (#105)
firstdata-dev Mar 29, 2026
4e5184a
chore(indexes): auto-update indexes
firstdata-dev Mar 29, 2026
55f9606
feat: add Kenya KNBS and Ghana GSS data sources (Africa expansion)
firstdata-dev Mar 30, 2026
91133dd
fix: ghana-gss data_url 404 → economic-statistics page (200)
firstdata-dev Mar 30, 2026
2f86e25
feat: add Kenya KNBS and Ghana GSS data sources (#106)
firstdata-dev Mar 30, 2026
65f989c
chore(indexes): auto-update indexes
firstdata-dev Mar 30, 2026
92ff739
feat: add 5 Chinese government data sources (PM batch 2026-03-30)
firstdata-dev Mar 30, 2026
e9894a0
fix: china-chinatax data_url 404 → n810219 (200)
firstdata-dev Mar 30, 2026
b34a30d
feat: add 5 Chinese government data sources (#107)
firstdata-dev Mar 30, 2026
957a1bf
chore(indexes): auto-update indexes
firstdata-dev Mar 30, 2026
5bfad67
feat: add china-mofa and china-cac data sources
firstdata-dev Mar 31, 2026
33b98c6
feat: add china-mofa and china-cac data sources (#108)
firstdata-dev Mar 31, 2026
bde7e0f
chore(indexes): auto-update indexes
firstdata-dev Mar 31, 2026
45e9ea5
feat: add 5 China data sources (GAS, Wanfang Data, Guangdong/Jiangsu/…
firstdata-dev Mar 31, 2026
6e92134
fix: 3 provincial stats data_url 404 (fujian/guangdong/jiangsu)
firstdata-dev Mar 31, 2026
efc0cbe
feat: add 5 Chinese sources - sports, wanfang, provincial stats (#109)
firstdata-dev Mar 31, 2026
fd55e26
chore(indexes): auto-update indexes
firstdata-dev Mar 31, 2026
1eb3d38
fix: remove cron residue files (jiangsu-stats + nanjing-stats duplica…
firstdata-dev Mar 31, 2026
7ca36da
fix: remove cron residue files (jiangsu-stats + nanjing-stats) (#111)
firstdata-dev Mar 31, 2026
780f8c2
chore(indexes): auto-update indexes
firstdata-dev Mar 31, 2026
651c648
docs: add MCP tool limitations, report_feedback example, and descript…
firstdata-dev Mar 31, 2026
c41aa9e
feat: add Beijing and Shanghai Municipal Bureau of Statistics data so…
firstdata-dev Apr 1, 2026
5da4579
feat: add China Foreign Exchange Trade System (CFETS) data source
firstdata-dev Apr 1, 2026
85f31a7
feat: add Beijing and Shanghai stats data sources (#113)
firstdata-dev Apr 1, 2026
bc668de
chore(indexes): auto-update indexes
firstdata-dev Apr 1, 2026
022252d
feat: add china-cfets data source (#114)
firstdata-dev Apr 1, 2026
2c940be
chore(indexes): auto-update indexes
firstdata-dev Apr 1, 2026
23e4b5f
feat: add 5 Chinese data sources (PM batch 2026-04-01)
firstdata-dev Apr 1, 2026
56c177b
fix: china-ha-stats data_url 404 → /tjfw/tjsj/ (200)
firstdata-dev Apr 1, 2026
c8565ca
feat: add 4 provincial stats + china-isc (#115)
firstdata-dev Apr 1, 2026
9ae7c84
chore(indexes): auto-update indexes
firstdata-dev Apr 1, 2026
6077126
feat: add 5 Chinese government data sources (AM batch, 2026-04-02)
firstdata-dev Apr 2, 2026
786e0ef
fix: china-sc-stats + china-spb data_url 404 (明察 QA)
firstdata-dev Apr 2, 2026
edce2a3
feat: add 3 provincial stats + cnao + spb (#116)
firstdata-dev Apr 2, 2026
088b450
chore(indexes): auto-update indexes
firstdata-dev Apr 2, 2026
8825482
feat(china): add 5 Chinese data sources - PM batch 2026-04-02
firstdata-dev Apr 2, 2026
2a9ac83
fix: china-cisa data_url → /gxportal/ portal (200, 明察 QA)
firstdata-dev Apr 2, 2026
c2e39d4
feat: add 3 provincial stats + caam + cisa (#117)
firstdata-dev Apr 2, 2026
2959777
chore(indexes): auto-update indexes
firstdata-dev Apr 2, 2026
103b14d
feat: add 5 Chinese government data sources (AM batch, 2026-04-03)
firstdata-dev Apr 3, 2026
f1b958a
fix: china-gz-stats + china-hlj-stats data_url (明察 QA)
firstdata-dev Apr 3, 2026
c5de6a7
fix: china-gz-stats data_url → stjj.guizhou.gov.cn/tjsj/ (200)
firstdata-dev Apr 3, 2026
1b483fb
feat: add northeast provinces + guizhou + saac (#118)
firstdata-dev Apr 3, 2026
039cb22
chore(indexes): auto-update indexes
firstdata-dev Apr 3, 2026
1039280
feat: add 5 China data sources (PM batch 2026-04-03)
firstdata-dev Apr 3, 2026
d5573b3
fix: china-jx-stats data_url 404 → /col/col40939/ (明察 QA)
firstdata-dev Apr 3, 2026
1ea0bcf
feat: add hebei + jiangxi + guangxi + stma + cflp (#119)
firstdata-dev Apr 3, 2026
738c3b1
chore(indexes): auto-update indexes
firstdata-dev Apr 3, 2026
09453f4
feat: add 5 Chinese provincial statistics bureaus (AM batch, 2026-04-04)
firstdata-dev Apr 4, 2026
13cbd55
feat: add western provinces stats (#120)
firstdata-dev Apr 4, 2026
0ee848a
chore(indexes): auto-update indexes
firstdata-dev Apr 4, 2026
ba17b62
feat: add 5 China data sources (PM batch 2026-04-04)
firstdata-dev Apr 4, 2026
dd747e2
feat: add xizang + cfa + iac + nia + nhsa (#121)
firstdata-dev Apr 4, 2026
b0da2c3
chore(indexes): auto-update indexes
firstdata-dev Apr 4, 2026
178a556
feat: add 5 Chinese government data sources (AM batch, 2026-04-05)
firstdata-dev Apr 5, 2026
b154625
fix: nx-stats domain migration + remove coal-industry (domain hijacke…
firstdata-dev Apr 5, 2026
9d741d3
fix: china-sx-stats HTTP → HTTPS
firstdata-dev Apr 5, 2026
9d6f953
feat: add ningxia + hainan + shanxi + mva (#122)
firstdata-dev Apr 5, 2026
9f7cef4
chore(indexes): auto-update indexes
firstdata-dev Apr 5, 2026
6b3b04e
feat: add 5 China data sources (SASAC, CEA, NFSRA, CBA, ACFIC)
firstdata-dev Apr 5, 2026
92c0edd
fix: acfic + cba data_url 404 (明察 QA)
firstdata-dev Apr 5, 2026
bdffe2e
feat: add sasac + cea + nfsra + cba + acfic (#123)
firstdata-dev Apr 5, 2026
73f5d7a
chore(indexes): auto-update indexes
firstdata-dev Apr 5, 2026
811b89e
feat: add 5 Chinese government data sources (AM batch, 2026-04-06)
firstdata-dev Apr 6, 2026
ffa90c9
fix: china-ndcpa data_url 403 → /jbkzzx/c100016/ (明察 QA)
firstdata-dev Apr 6, 2026
d36fa17
feat: add nrta + nra + cas + cae + ndcpa (#124)
firstdata-dev Apr 6, 2026
b7c14e0
chore(indexes): auto-update indexes
firstdata-dev Apr 6, 2026
08aa826
feat: add 5 China data sources (pm batch 2026-04-06)
firstdata-dev Apr 6, 2026
4588a6f
fix: cfpa wrong domain (扶贫基金会→财政研究院) + cnia wrong domain (核仪器→有色金属)
firstdata-dev Apr 6, 2026
bec53a0
fix: remove cfpa (soft 404) + cnia data_url path fix
firstdata-dev Apr 6, 2026
94fcccd
fix: china-cnia HTTP → HTTPS
firstdata-dev Apr 6, 2026
7656a6f
feat: add cffex + ncac + natcm + cnia (#125)
firstdata-dev Apr 6, 2026
16937d4
chore(indexes): auto-update indexes
firstdata-dev Apr 6, 2026
b80d18f
feat: add 5 Chinese government data sources (AM batch, 2026-04-07)
firstdata-dev Apr 7, 2026
1c23f20
fix: remove cdc(445)+coal-association(域名被占) + fix cast/cntac data_url
firstdata-dev Apr 7, 2026
8e44f3f
fix: cast + cntac data_url to actual content pages
firstdata-dev Apr 7, 2026
8f31218
feat: add cnsa + cast + cntac (#126)
firstdata-dev Apr 7, 2026
6b81b90
chore(indexes): auto-update indexes
firstdata-dev Apr 7, 2026
7050d21
feat: add 5 China data sources (PM batch 2026-04-07)
firstdata-dev Apr 7, 2026
d919894
fix: china-nsfc data_url 404 → /p1/2961/2964/3655/cg.html (明察 QA)
firstdata-dev Apr 7, 2026
8b0e126
feat: add nsfc + npc-law + spc + spp + nmdis (#127)
firstdata-dev Apr 7, 2026
e224349
chore(indexes): auto-update indexes
firstdata-dev Apr 7, 2026
37745bd
feat: add 5 Chinese industry associations (AM batch 2026-04-08)
firstdata-dev Apr 8, 2026
c2d7e4f
fix: authority_level commercial→other for all 5 associations + cnfia …
firstdata-dev Apr 8, 2026
6567431
fix: china-csre 稀土行业协会→稀土学会 (明察 QA)
firstdata-dev Apr 8, 2026
692934a
feat: add cpia + cwea + cnfia + caamm + csre (#128)
firstdata-dev Apr 8, 2026
e5b8517
chore(indexes): auto-update indexes
firstdata-dev Apr 8, 2026
7064167
feat: add 5 Chinese government data sources (AM batch, 2026-04-08)
firstdata-dev Apr 8, 2026
eed38df
fix: remove gold(域名劫持吃瓜网)+cec/cnca/acftu(连接超时) + fix ccpit data_url
firstdata-dev Apr 8, 2026
cf52596
fix: ccpit data_url jgsz(机构设置)→homepage (明察 QA)
firstdata-dev Apr 8, 2026
29378bd
feat: add ccpit (#129)
firstdata-dev Apr 8, 2026
846469d
chore(indexes): auto-update indexes
firstdata-dev Apr 8, 2026
53d50b0
feat: add 5 China data sources (afternoon batch 2026-04-08)
firstdata-dev Apr 8, 2026
a3aea89
fix: cpcif data_url /detail/→/list/ (500→200, 明察 QA)
firstdata-dev Apr 8, 2026
3d64f78
feat: add ncha + cpcif + drc + caai + ccf (#130)
firstdata-dev Apr 8, 2026
f922894
chore(indexes): auto-update indexes
firstdata-dev Apr 8, 2026
8460854
feat: add 5 Chinese government data sources (AM batch, 2026-04-09) (#…
firstdata-dev Apr 9, 2026
a797978
chore(indexes): auto-update indexes
firstdata-dev Apr 9, 2026
9a1d451
feat: add 5 Chinese industry association data sources (PM batch) (#132)
firstdata-dev Apr 9, 2026
dac483c
chore(indexes): auto-update indexes
firstdata-dev Apr 9, 2026
ffff27a
feat(ci): add taxonomy health check script (#133)
mingcha-dev Apr 9, 2026
469d154
feat: add 5 Chinese government data sources (AM batch, 2026-04-10) (#…
firstdata-dev Apr 10, 2026
90cc306
chore(indexes): auto-update indexes
firstdata-dev Apr 10, 2026
3e334af
feat: add 5 China data sources (PM batch 2026-04-10) (#136)
firstdata-dev Apr 10, 2026
af2c535
chore(indexes): auto-update indexes
firstdata-dev Apr 10, 2026
dc480b2
feat: add 5 Chinese government data sources (AM batch, 2026-04-11) (#…
firstdata-dev Apr 11, 2026
1e53905
chore(indexes): auto-update indexes
firstdata-dev Apr 11, 2026
6e9fe00
fix: remove china-cia (duplicate of china-iac, same org iachina.cn) (…
firstdata-dev Apr 11, 2026
b99fe89
chore(indexes): auto-update indexes
firstdata-dev Apr 11, 2026
2e71a7e
feat: add 5 Chinese data sources (2026-04-11 PM batch) (#141)
firstdata-dev Apr 11, 2026
ddcdd25
chore(indexes): auto-update indexes
firstdata-dev Apr 11, 2026
546f0a8
feat: add 5 Chinese government data sources (AM batch, 2026-04-12) (#…
firstdata-dev Apr 12, 2026
a6556cd
feat: add 5 Chinese data sources (PM batch, 2026-04-12) (#143)
firstdata-dev Apr 12, 2026
8581c7c
chore(indexes): auto-update indexes
firstdata-dev Apr 12, 2026
594f00f
feat: add 5 Chinese government data sources (AM batch, 2026-04-13) (#…
firstdata-dev Apr 13, 2026
bebc331
fix: correct china-gz-stats website (广州→贵州) (#145)
firstdata-dev Apr 13, 2026
e81d7ac
chore(indexes): auto-update indexes
firstdata-dev Apr 13, 2026
eee24d9
feat: add 5 Chinese data sources (PM batch, 2026-04-13) (#146)
firstdata-dev Apr 13, 2026
34ace99
chore(indexes): auto-update indexes
firstdata-dev Apr 13, 2026
42759c7
feat: add 5 Chinese government data sources (AM batch, 2026-04-14) (#…
firstdata-dev Apr 15, 2026
8b742cf
chore(indexes): auto-update indexes
firstdata-dev Apr 15, 2026
1b339c7
feat: add 5 Chinese data sources (PM batch, 2026-04-14) (#148)
firstdata-dev Apr 15, 2026
a409edd
chore(indexes): auto-update indexes
firstdata-dev Apr 15, 2026
1ab0cae
feat: add 5 Chinese government data sources (AM batch, 2026-04-15) (#…
firstdata-dev Apr 17, 2026
a69f3fa
chore(indexes): auto-update indexes
firstdata-dev Apr 17, 2026
97b4aac
feat: add 5 Chinese data sources (PM batch, 2026-04-15) (#150)
firstdata-dev Apr 17, 2026
4154324
feat: add 5 Chinese government data sources (AM batch, 2026-04-16) (#…
firstdata-dev Apr 17, 2026
b9df836
feat: add 5 Chinese data sources (PM batch, 2026-04-16) (#152)
firstdata-dev Apr 17, 2026
193e99b
feat: add 5 Chinese government and industry data sources (AM batch, 2…
firstdata-dev Apr 17, 2026
75a9f05
fix: remove duplicate cpdrc (same ID two paths) + ncssf (same website…
firstdata-dev Apr 17, 2026
331a11f
chore(indexes): auto-update indexes
firstdata-dev Apr 17, 2026
e15ae04
feat: add 5 Chinese data sources (PM batch, 2026-04-17) (#155)
firstdata-dev Apr 17, 2026
65b81b7
chore(indexes): auto-update indexes
firstdata-dev Apr 17, 2026
e8206cf
feat: add 5 Chinese research survey and emissions data sources (#156)
firstdata-dev Apr 18, 2026
409f82a
chore(indexes): auto-update indexes
firstdata-dev Apr 18, 2026
51cf5ab
feat: add 5 Chinese data sources (PM batch, 2026-04-18) (#157)
firstdata-dev Apr 18, 2026
9ed3672
chore(indexes): auto-update indexes
firstdata-dev Apr 18, 2026
3c4334a
feat: add 5 Chinese research and government data sources (AM batch, 2…
firstdata-dev Apr 19, 2026
9484314
chore(indexes): auto-update indexes
firstdata-dev Apr 19, 2026
df3be40
feat: add 5 Chinese data sources (PM batch, 2026-04-19) (#160)
firstdata-dev Apr 19, 2026
f19f9f8
chore(indexes): auto-update indexes
firstdata-dev Apr 19, 2026
d8c8d9d
fix: update 4 stale URLs with migrated domains (closes #161) (#162)
firstdata-dev Apr 20, 2026
bc2d3a8
chore(indexes): auto-update indexes
firstdata-dev Apr 20, 2026
54a1fcd
feat: add 5 Chinese government data sources (AM batch, 2026-04-20) (#…
firstdata-dev Apr 20, 2026
65017d4
chore(indexes): auto-update indexes
firstdata-dev Apr 20, 2026
4e851ce
feat: add 5 Chinese data sources (PM batch, 2026-04-21) (#166)
firstdata-dev Apr 21, 2026
15942a6
chore(indexes): auto-update indexes
firstdata-dev Apr 21, 2026
7f35527
feat: add 5 Chinese authoritative data sources (2026-04-22 AM) (#167)
firstdata-dev Apr 22, 2026
dea743b
chore(indexes): auto-update indexes
firstdata-dev Apr 22, 2026
dd11134
feat: add 5 China authority data sources (20260423 AM batch) (#169)
firstdata-dev Apr 23, 2026
cc48a99
chore(indexes): auto-update indexes
firstdata-dev Apr 23, 2026
c4afe41
feat: add 5 China authoritative data sources (PM batch 2026-04-23) (#…
firstdata-dev Apr 23, 2026
fa6cf76
chore(indexes): auto-update indexes
firstdata-dev Apr 23, 2026
66c6738
feat: add 5 Chinese government data sources (AM batch, 2026-04-19) (#…
firstdata-dev Apr 23, 2026
9c69405
feat: add 5 Chinese data sources (PM batch, 2026-04-20) (#164)
firstdata-dev Apr 23, 2026
862754c
feat: add 5 Chinese government data sources (AM batch, 2026-04-21) (#…
firstdata-dev Apr 23, 2026
4df9c7f
feat: add 5 China authoritative data sources (afternoon batch 2026-04…
firstdata-dev Apr 23, 2026
cccbf76
chore(indexes): auto-update indexes
firstdata-dev Apr 23, 2026
7f2a96d
feat: add 5 judicial auction data sources
mingcha-dev Apr 23, 2026
8fcd6cc
chore(indexes): auto-update indexes
firstdata-dev Apr 23, 2026
d86ee38
fix: enforce domain format validation (lowercase + hyphens only)
firstdata-dev Apr 25, 2026
10aee13
chore(indexes): auto-update indexes
firstdata-dev Apr 25, 2026
dfaf8fc
feat: add 5 China authoritative data sources (AM batch 2026-04-24) (#…
firstdata-dev Apr 28, 2026
3d2e156
feat: add 5 China sources (AM batch 2026-04-25) (#177)
firstdata-dev Apr 28, 2026
a0ee340
feat: add 5 China authoritative data sources (PM batch 2026-04-25) (#…
firstdata-dev Apr 28, 2026
d5027da
feat: add 5 Chinese authoritative data sources (AM batch 2026-04-26) …
firstdata-dev Apr 28, 2026
d03071a
feat: add 5 Chinese authoritative data sources (health, research, env…
firstdata-dev Apr 28, 2026
9195a7a
feat: add 5 China authority data sources (AM batch 2026-04-28) (#185)
firstdata-dev Apr 28, 2026
99c3e4c
chore(indexes): auto-update indexes
firstdata-dev Apr 28, 2026
7c5361c
feat: add 5 new data sources (#183)
firstdata-dev Apr 28, 2026
fa1ba79
feat: add 5 China authoritative data sources (PM batch 2026-04-27) (#…
firstdata-dev Apr 28, 2026
e2832e4
feat: add 5 new data sources (#186)
firstdata-dev Apr 28, 2026
d432bac
feat: add 5 China authoritative data sources (PM batch 2026-04-28) (#…
firstdata-dev Apr 28, 2026
234181c
feat: add 5 new data sources (#175)
firstdata-dev Apr 28, 2026
0d3f9c9
feat: add 5 China authoritative sources (PM batch 2026-04-24) (#176)
firstdata-dev Apr 28, 2026
cc2712d
feat: add 4 new data sources (#178)
firstdata-dev Apr 28, 2026
653f849
ci: add secrecy check workflow for PR descriptions and source files (…
firstdata-dev Apr 28, 2026
1d13a25
feat: add 5 China authoritative data sources (2026-04-29 AM) (#190)
firstdata-dev Apr 30, 2026
95a408d
feat: add 5 China authoritative data sources (2026-04-29 PM) (#192)
firstdata-dev Apr 30, 2026
0254927
feat(china): add 5 authoritative China data sources (2026-04-30 AM) (…
firstdata-dev Apr 30, 2026
17f153d
fix(schema): move access_notes into description to satisfy schema (#195)
firstdata-dev Apr 30, 2026
8a0e379
chore(indexes): auto-update indexes
firstdata-dev Apr 30, 2026
9db08f3
chore: normalize whitespace in tags across all sources (keep Chinese)…
mingcha-dev Apr 30, 2026
65dd757
chore(indexes): auto-update indexes
firstdata-dev Apr 30, 2026
bb03edd
feat: add 5 new data sources (#194)
firstdata-dev Apr 30, 2026
af6fb8e
feat: add 4 new data sources (#191)
firstdata-dev Apr 30, 2026
9e7c80c
chore(indexes): auto-update indexes
firstdata-dev Apr 30, 2026
4814d1d
feat(schema): enforce no-whitespace in tags via pattern ^\\S+$
Apr 30, 2026
74d78b1
feat: add 5 China authority sources (下午批次) (#198)
firstdata-dev Apr 30, 2026
e923c24
chore(indexes): auto-update indexes
firstdata-dev Apr 30, 2026
a0b0e9c
feat: add 5 China authoritative data sources (AM batch) (#199)
firstdata-dev May 1, 2026
cf5ea10
chore(indexes): auto-update indexes
firstdata-dev May 1, 2026
f3b54e7
feat: add 5 new data sources (#200)
firstdata-dev May 1, 2026
3358a3e
chore(indexes): auto-update indexes
firstdata-dev May 1, 2026
1fe17bd
feat: add 5 China authority data sources (2026-05-01 PM batch) (#201)
firstdata-dev May 1, 2026
7c9acbf
chore(indexes): auto-update indexes
firstdata-dev May 1, 2026
c108fce
feat: add 5 China authoritative data sources (AM batch) (#202)
firstdata-dev May 2, 2026
749d1b5
chore(indexes): auto-update indexes
firstdata-dev May 2, 2026
fc3ca5a
feat: add China Electronic Materials Industry Association (CEMIA) (#203)
firstdata-dev May 2, 2026
e6d7c11
chore(indexes): auto-update indexes
firstdata-dev May 2, 2026
6519e5e
feat(china): add 5 Chinese authoritative data sources (PM batch 2026-…
firstdata-dev May 2, 2026
0615ad9
chore(indexes): auto-update indexes
firstdata-dev May 2, 2026
57b5557
feat: add 5 Chinese authoritative data sources (PM batch) (#205)
firstdata-dev May 3, 2026
1df2a1a
chore(indexes): auto-update indexes
firstdata-dev May 3, 2026
4a5b704
feat: add 5 China authoritative sources (AM batch 2026-05-04) (#206)
firstdata-dev May 4, 2026
ce54621
chore(indexes): auto-update indexes
firstdata-dev May 4, 2026
a862b69
feat: add 5 new data sources (#207)
firstdata-dev May 4, 2026
211f753
chore(indexes): auto-update indexes
firstdata-dev May 4, 2026
213440a
feat: add 5 China authoritative sources (PM batch 2026-05-04) (#208)
firstdata-dev May 4, 2026
bad4772
chore(indexes): auto-update indexes
firstdata-dev May 4, 2026
f22aa09
docs(positioning): ADR-001 — Reposition FirstData as External Facts C…
firstdata-dev May 6, 2026
3ef2375
docs(positioning): execute 22 CHANGE copy edits per ADR-001 scope loc…
firstdata-dev May 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
77 changes: 77 additions & 0 deletions .github/workflows/secrecy-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
name: Secrecy Check

on:
pull_request:
types: [opened, edited, synchronize]

jobs:
check-secrecy:
runs-on: ubuntu-latest
steps:
- name: Check PR metadata for confidential terms
env:
PR_BODY: ${{ github.event.pull_request.body }}
PR_TITLE: ${{ github.event.pull_request.title }}
PR_BRANCH: ${{ github.event.pull_request.head.ref }}
run: |
BANNED_TERMS=(
"langfuse"
"insight pipeline"
"gitlab"
"code.mlamp.cn"
"codex.mlamp.cn"
"glab"
"im.deepminer"
"im-test.xming"
)

found=0

check_field() {
local label="$1"
local text="$2"
local lower_text
lower_text=$(printf '%s' "$text" | tr '[:upper:]' '[:lower:]')

for term in "${BANNED_TERMS[@]}"; do
lower_term=$(printf '%s' "$term" | tr '[:upper:]' '[:lower:]')
if [[ "$lower_text" == *"$lower_term"* ]]; then
echo "::error::🔴 BLOCKED: '$term' found in $label"
found=1
fi
done
}

check_field "branch name" "$PR_BRANCH"
check_field "PR title" "$PR_TITLE"
check_field "PR description" "$PR_BODY"

if [ "$found" -eq 1 ]; then
echo "::error::PR contains confidential term(s). Remove internal tool references before merging."
exit 1
fi

echo "✅ PR metadata secrecy check passed."

- name: Checkout code
uses: actions/checkout@v4

- name: Check source files for confidential terms
run: |
BANNED_TERMS=("langfuse" "insight pipeline" "gitlab" "code.mlamp.cn" "codex.mlamp.cn" "glab" "im.deepminer" "im-test.xming")
found=0

for term in "${BANNED_TERMS[@]}"; do
matches=$(grep -ril "$term" firstdata/sources/ 2>/dev/null || true)
if [ -n "$matches" ]; then
echo "::error::🔴 '$term' found in source files: $matches"
found=1
fi
done

if [ "$found" -eq 1 ]; then
echo "::error::Source files contain confidential terms."
exit 1
fi

echo "✅ Source files secrecy check passed."
26 changes: 0 additions & 26 deletions .github/workflows/validate-sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,29 +42,3 @@ jobs:

- name: Check for duplicate IDs
run: uv run python scripts/check_ids.py

claude-review:
needs: validate
if: github.event.sender.type != 'Bot'
continue-on-error: true
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
issues: write
id-token: write

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 1

- name: Run Claude Code Review
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
claude_args: '--model ${{ secrets.CLAUDE_MODEL }} --max-turns 100 --allowedTools "Bash"'
prompt: '/review Review this PR and post your review as a PR comment using `gh pr comment`. Reply in the same language used in the PR (title, description, and comments).'
env:
ANTHROPIC_BASE_URL: ${{ secrets.ANTHROPIC_BASE_URL }}
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,4 @@ batch-run-results*.md
logs/

# AI IDE
.claude/
**/CLAUDE.md
.claude/server.json
12 changes: 8 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file is intended for AI coding agents (Claude Code, OpenClaw, Codex, Copilo

## What This Repo Is

**FirstData** is a structured knowledge base of global authoritative open data sources. It is a **pure data repository** — no application code, no runtime logic.
**FirstData** is the External Facts Context Layer for AI Agents — a structured, authoritative collection of global open data sources. It is a **pure data repository** — no application code, no runtime logic.

Your job here is to **create or edit JSON metadata files** that describe real-world data sources (government databases, international organizations, academic datasets, etc.).

Expand Down Expand Up @@ -97,10 +97,14 @@ firstdata/sources/
- PubMed → `sources/academic/health/pubmed.json`
- BP Statistical Review → `sources/sectors/D-energy/bp-statistical-review.json`

## Do Not Touch
## Do Not Touch (Auto-Generated or Protected)

- `firstdata/indexes/` — auto-generated, do not edit manually
- `firstdata/schemas/datasource-schema.json` — the schema definition itself
The following files are maintained automatically by CI/scripts. **AI agents must NOT modify them manually:**

- `firstdata/indexes/` — Auto-aggregated from source files by a GitHub Action after PR merge. Never edit these directly.
- `firstdata/schemas/datasource-schema.json` — Schema definition, only modified by maintainers directly on main.

**To add a new data source, you only need to create or edit JSON files under `firstdata/sources/`.** Everything else (indexes, schema) is handled automatically.

## Security Note for Contributors

Expand Down
173 changes: 173 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**FirstData** is the External Facts Context Layer for AI Agents — a structured, authoritative collection of global open data sources. It is a **pure data repository** — no application code, no runtime logic.

Your job here is to **create or edit JSON metadata files** that describe real-world data sources (government databases, international organizations, academic datasets, etc.).

The project exposes a hosted MCP Server at `https://firstdata.deepminer.com.cn/mcp`.

## Validation

Dependencies are managed with [uv](https://docs.astral.sh/uv/). Run the following before submitting:

```bash
# Install dependencies (first time only)
uv sync

# Run all validation checks
make check

# Or run checks individually:
make validate # Validate JSON schema compliance
make check-ids # Check for duplicate IDs
make check-domains # Check domain naming consistency
```

A GitHub Action runs these checks automatically on every PR. PRs that fail validation cannot be merged.

## Repository Structure

```
firstdata/
├── schemas/datasource-schema.json # JSON Schema v2.0.0 (the source of truth for data format)
├── sources/ # Individual data source JSON files, organized by category
│ ├── china/ # Chinese government & institutions
│ ├── international/ # International organizations (by domain)
│ ├── countries/ # National official sources (by continent/country)
│ ├── academic/ # Academic research databases (by discipline)
│ └── sectors/ # Industry sources (by ISIC Rev.4 code)
└── indexes/ # Auto-generated aggregated indexes (do not edit manually)
├── all-sources.json
├── by-authority.json
├── by-domain.json
├── by-region.json
└── statistics.json
```

## The JSON Schema

Every file under `firstdata/sources/` must conform to `firstdata/schemas/datasource-schema.json`.

### Required Fields

```json
{
"id": "worldbank-open-data",
"name": {
"en": "World Bank Open Data",
"zh": "世界银行开放数据"
},
"description": {
"en": "...",
"zh": "..."
},
"website": "https://www.worldbank.org",
"data_url": "https://data.worldbank.org",
"api_url": "https://api.worldbank.org/v2/",
"authority_level": "international",
"country": null,
"domains": ["economics", "health", "education"],
"geographic_scope": "global",
"update_frequency": "quarterly",
"tags": ["world bank", "development", "gdp", "poverty", "世界银行"],
"data_content": {
"en": ["GDP and national accounts", "Poverty and inequality indicators"],
"zh": ["GDP和国民账户", "贫困和不平等指标"]
}
}
```

### Field Rules

| Field | Allowed Values / Constraints |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `id` | Lowercase, hyphens only. Must be globally unique. Pattern:`^[a-z0-9-]+$` |
| `name.en` | Required. Add `zh` and `native` when applicable |
| `description.en` | Required. Add `zh` when applicable |
| `website` | Top-level org homepage |
| `data_url` | Must point directly to the data access/download page, NOT the homepage |
| `api_url` | API docs or endpoint URL. Use `null` if no API exists |
| `authority_level` | `government` · `international` · `research` · `market` · `commercial` · `other` |
| `country` | ISO 3166-1 alpha-2 (e.g.`"CN"`, `"US"`). **Must be `null`** when `geographic_scope` is `global` or `regional` |
| `domains` | Array of strings, at least one. **MUST use lowercase** (e.g., `"economics"` not `"Economics"`). See [DOMAINS.md](firstdata/schemas/DOMAINS.md) for standard domain list |
| `geographic_scope` | `global` · `regional` · `national` · `subnational` |
| `update_frequency` | `real-time` · `daily` · `weekly` · `monthly` · `quarterly` · `annual` · `irregular` |
| `tags` | Mixed Chinese/English keywords for semantic search. Include synonyms and data type names |
| `data_content` | Optional but recommended. Lists of strings describing what data is available |

## Where to Place New Files

```
firstdata/sources/
├── china/{domain}/{id}.json # Chinese gov & institutions
├── international/{domain}/{id}.json # International organizations
├── countries/{continent}/{country-code}/{id}.json # National official sources
├── academic/{discipline}/{id}.json # Academic/research databases
└── sectors/{ISIC-code}-{name}/{id}.json # Industry datasets
```

**Examples:**

- China customs data → `sources/china/economy/trade/customs.json`
- WHO health data → `sources/international/health/who.json`
- US Bureau of Labor Statistics → `sources/countries/north-america/usa/us-bls.json`
- PubMed → `sources/academic/health/pubmed.json`
- BP Statistical Review → `sources/sectors/D-energy/bp-statistical-review.json`

## Do Not Touch (Auto-Generated or Protected)

The following files are maintained automatically by CI/scripts. **AI agents must NOT modify them manually:**

- `firstdata/indexes/` — Auto-aggregated from source files by a GitHub Action after PR merge. Never edit these directly.
- `firstdata/schemas/datasource-schema.json` — Schema definition, only modified by maintainers directly on main.

**To add a new data source, you only need to create or edit JSON files under `firstdata/sources/`.** Everything else (indexes, schema) is handled automatically.

## Before Adding a New Source

**First, check `firstdata/indexes/all-sources.json` to confirm the data source does not already exist.**

Search by `id`, `name.en`, or `website` to detect duplicates:

```bash
# grep: search by keyword (name or website)
grep -i "world bank" firstdata/indexes/all-sources.json
grep -i "worldbank.org" firstdata/indexes/all-sources.json

# jq: search by id
jq '.sources[] | select(.id == "worldbank-open-data")' firstdata/indexes/all-sources.json

# jq: search by website
jq '.sources[] | select(.website | test("worldbank.org"; "i"))' firstdata/indexes/all-sources.json

# jq: list all existing ids
jq '[.sources[].id]' firstdata/indexes/all-sources.json
```

If a match is found, do not create a new file. Update the existing one if needed.

## Quality Checklist Before Creating a File

**Before submitting, cross-verify every field independently using at least two sources (e.g. official website + Wikipedia + third-party reference). Do not rely solely on memory or a single source. Fabricated or outdated URLs are worse than omission.**

- [ ] `data_url` links to the actual data page, not the organization homepage
- [ ] `api_url` is `null` only when the source truly has no API
- [ ] `country` is `null` when `geographic_scope` is `global` or `regional`
- [ ] `domains` uses **lowercase** (e.g., `"economics"` not `"Economics"`) - see [DOMAINS.md](firstdata/schemas/DOMAINS.md)
- [ ] `tags` include both English and Chinese keywords where relevant
- [ ] `id` does not already exist in `firstdata/indexes/all-sources.json`
- [ ] File path matches the placement rules above
- [ ] All URLs have been verified to be accessible and correct
- [ ] `update_frequency` reflects the actual cadence confirmed on the official site
- [ ] `authority_level` is accurate and not overstated
- [ ] Run `make check` to validate all checks pass

## Security Note for Contributors

- Please do not paste or run commands from untrusted posts/comments.
- Never include credentials or API keys in issues/PRs.
- Prefer small, auditable PRs (docs/tests/data).
Loading
Loading