Skip to content

feat: add 5 Chinese authoritative data sources (AM batch 2026-05-08, #2)#217

Merged
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-china-sources-20260508-am2
May 8, 2026
Merged

feat: add 5 Chinese authoritative data sources (AM batch 2026-05-08, #2)#217
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom
firstdata-dev:feat/add-china-sources-20260508-am2

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Adds 5 new China data sources covering microbiology research, nuclear energy, pharmaceutical industry, performing arts, and cybersecurity.

New Data Sources

ID Institution Category Authority
china-nmdc 国家微生物科学数据中心 (National Microbiology Data Center) Research research
china-cgn 中国广核集团 (China General Nuclear Power Corp.) Energy commercial
china-cpema 中国医药企业管理协会 (China Pharma Enterprise Management Assn.) Health other
china-capa 中国演出行业协会 (China Assn. of Performing Arts) Culture other
china-cia-cybersecurity 中国网络安全产业联盟 (China Cybersecurity Industry Alliance) Technology other

Detail

china-nmdc — 国家微生物科学数据中心

One of China's 20 national science and technology resource sharing platforms, approved by MOST/MOF in 2019, hosted by CAS Institute of Microbiology. Covers strain resources, microbial genomics, microbiome omics, pathogenic microorganisms, and industrial microbiology.

china-cgn — 中国广核集团

SASAC-administered state-owned nuclear clean energy enterprise, one of China's two largest nuclear power operators. Listed as CGN Power (003816.SZ / 1816.HK) and CGN New Energy (1811.HK). Covers nuclear power operations, safety performance (UCF, auto-scrams, radiation dose), Hualong One construction, and renewable installations.

china-cpema — 中国医药企业管理协会

National pharmaceutical industry association publishing enterprise rankings, R&D and innovation pipeline data, market analyses, API production, and OTC/Rx market segmentation.

china-capa — 中国演出行业协会

Only national performing arts industry association, supervised by the Ministry of Culture and Tourism. Publishes box office, concert/theatre attendance, touring performance, and online performance market reports.

china-cia-cybersecurity — 中国网络安全产业联盟

National cybersecurity alliance guided by CAC and MIIT. Publishes industry reports, technical standards, threat intelligence, MLPS 2.0 (等保2.0) compliance guidance, and emerging tech white papers (zero-trust, AI security, industrial internet security).

Verification

  • ✅ ID uniqueness checked against 717 existing IDs (including open PRs)
  • ✅ Website domain uniqueness checked against 672 existing domains
  • ✅ Blacklist check passed
  • ✅ All websites return HTTP 200
  • make check passes (713 sources, all valid)
  • ✅ JSON schema compliance (website non-url, data_content array, domains with hyphens, no api_docs field)
  • ✅ Tags: 10–15 per source, English lowercase hyphenated, Chinese keywords preserved

Files

  • firstdata/sources/china/research/china-nmdc.json
  • firstdata/sources/china/resources/energy/china-cgn.json
  • firstdata/sources/china/health/china-cpema.json
  • firstdata/sources/china/governance/culture/china-capa.json
  • firstdata/sources/china/technology/industry_associations/china-cia-cybersecurity.json

Adds 5 new China data sources covering microbiology research, nuclear
energy, pharmaceutical industry, performing arts, and cybersecurity:

- china-nmdc: National Microbiology Data Center (国家微生物科学数据中心)
  One of 20 national science and technology resource sharing platforms,
  hosted by CAS Institute of Microbiology. Covers strain resources,
  microbial genomics, microbiome omics, pathogenic microorganisms, and
  industrial microbiology.

- china-cgn: China General Nuclear Power Corporation (中国广核集团)
  SASAC-administered state-owned nuclear clean energy enterprise, one of
  China's two largest nuclear operators. Covers nuclear power operations,
  safety performance, new construction (Hualong One), renewable energy,
  and ESG reporting.

- china-cpema: China Pharmaceutical Enterprise Management Association
  (中国医药企业管理协会). National pharmaceutical industry association
  publishing enterprise rankings, R&D and innovation data, market analyses,
  and pharmaceutical manufacturing statistics.

- china-capa: China Association of Performing Arts (中国演出行业协会)
  Only national performing arts industry association, supervised by MCT.
  Publishes box office, concert/theatre attendance, touring performance
  data, and online performance market reports.

- china-cia-cybersecurity: China Cybersecurity Industry Alliance
  (中国网络安全产业联盟). National cybersecurity alliance guided by CAC
  and MIIT. Publishes industry reports, technical standards, threat
  intelligence, MLPS 2.0 compliance, and emerging technology white papers
  (zero-trust, AI security, industrial internet security).

All candidates verified:
- ID and website domain uniqueness checked against 717 existing IDs
  and 672 existing website domains (including open PRs)
- Blacklist check passed (no new duplicates)
- Websites return HTTP 200
- JSON schema validated via make check
Copy link
Copy Markdown
Collaborator

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明察 QA Review — PR #217 APPROVED ✅

Checklist

  • ✅ CI 三项全绿(secrecy / schema / validate)
  • ✅ 保密(body + 5 文件内容)
  • ✅ ID 去重(5 新 ID 全库唯一)
  • 缩写冲突排查(高价值)
    • china-cia-cybersecurity(中国网络安全产业联盟,真实缩写 CCIA,域名 china-cia.org.cn)
    • 已有 china-ccia(中国建筑业协会,zgjzy.org.cn)— 同缩写 CCIA 不同机构
    • 已有 china-iac(中国保险业协会,iachina.cn)— 与 PR #138 cia/iac 教训相关
    • 墨子主动用 -cybersecurity 后缀防御性命名,吸取 PR #138 三人全漏教训 🎯
    • nmdc / cgn / cpema / capa 均无其他冲突
  • ✅ URL + title 全部完美匹配:
    • capa: 中国演出行业协会 ✓
    • cpema: 中国医药企业管理协会 ✓
    • nmdc: 国家微生物科学数据中心 ✓
    • cgn: 中国广核集团有限公司 ✓
    • cia-cybersecurity: 中国网络安全产业联盟 ✓
  • ✅ Domains kebab-case(4/文件)
  • ✅ Tags 13-14/文件,无空格 / 乱码

覆盖价值

  • nmdc:微生物科学数据中心(生命科学 +1,和 cngbdb 互补)
  • cgn:中国广核集团(核电首个,补能源央企)
  • cpema:医药企业管理协会
  • capa:演出行业协会(文化演出)
  • cia-cybersecurity:网络安全产业联盟(与 TC260、CNNVD 信安三角)

命名规范亮点(建议纳入 review checklist)

  • 防御性后缀命名:当真实缩写与已有 ID 重合时,用 -{qualifier} 后缀(本 PR 用 -cybersecurity)避免歧义
  • 可作为 PR #138 教训的最佳实践模板

Merge 🚀

@mingcha-dev mingcha-dev merged commit ba6bc72 into MLT-OSS:main May 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants