feat: add 5 Chinese authoritative data sources (AM batch 2026-05-08, #2)#217
Merged
mingcha-dev merged 1 commit intoMLT-OSS:mainfrom May 8, 2026
Merged
Conversation
Adds 5 new China data sources covering microbiology research, nuclear energy, pharmaceutical industry, performing arts, and cybersecurity: - china-nmdc: National Microbiology Data Center (国家微生物科学数据中心) One of 20 national science and technology resource sharing platforms, hosted by CAS Institute of Microbiology. Covers strain resources, microbial genomics, microbiome omics, pathogenic microorganisms, and industrial microbiology. - china-cgn: China General Nuclear Power Corporation (中国广核集团) SASAC-administered state-owned nuclear clean energy enterprise, one of China's two largest nuclear operators. Covers nuclear power operations, safety performance, new construction (Hualong One), renewable energy, and ESG reporting. - china-cpema: China Pharmaceutical Enterprise Management Association (中国医药企业管理协会). National pharmaceutical industry association publishing enterprise rankings, R&D and innovation data, market analyses, and pharmaceutical manufacturing statistics. - china-capa: China Association of Performing Arts (中国演出行业协会) Only national performing arts industry association, supervised by MCT. Publishes box office, concert/theatre attendance, touring performance data, and online performance market reports. - china-cia-cybersecurity: China Cybersecurity Industry Alliance (中国网络安全产业联盟). National cybersecurity alliance guided by CAC and MIIT. Publishes industry reports, technical standards, threat intelligence, MLPS 2.0 compliance, and emerging technology white papers (zero-trust, AI security, industrial internet security). All candidates verified: - ID and website domain uniqueness checked against 717 existing IDs and 672 existing website domains (including open PRs) - Blacklist check passed (no new duplicates) - Websites return HTTP 200 - JSON schema validated via make check
mingcha-dev
approved these changes
May 8, 2026
Collaborator
mingcha-dev
left a comment
There was a problem hiding this comment.
明察 QA Review — PR #217 APPROVED ✅
Checklist
- ✅ CI 三项全绿(secrecy / schema / validate)
- ✅ 保密(body + 5 文件内容)
- ✅ ID 去重(5 新 ID 全库唯一)
- ✅ 缩写冲突排查(高价值):
- ✅ URL + title 全部完美匹配:
- capa: 中国演出行业协会 ✓
- cpema: 中国医药企业管理协会 ✓
- nmdc: 国家微生物科学数据中心 ✓
- cgn: 中国广核集团有限公司 ✓
- cia-cybersecurity: 中国网络安全产业联盟 ✓
- ✅ Domains kebab-case(4/文件)
- ✅ Tags 13-14/文件,无空格 / 乱码
覆盖价值
- nmdc:微生物科学数据中心(生命科学 +1,和 cngbdb 互补)
- cgn:中国广核集团(核电首个,补能源央企)
- cpema:医药企业管理协会
- capa:演出行业协会(文化演出)
- cia-cybersecurity:网络安全产业联盟(与 TC260、CNNVD 信安三角)
命名规范亮点(建议纳入 review checklist)
- 防御性后缀命名:当真实缩写与已有 ID 重合时,用
-{qualifier}后缀(本 PR 用-cybersecurity)避免歧义 - 可作为 PR #138 教训的最佳实践模板
Merge 🚀
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 5 new China data sources covering microbiology research, nuclear energy, pharmaceutical industry, performing arts, and cybersecurity.
New Data Sources
Detail
china-nmdc — 国家微生物科学数据中心
One of China's 20 national science and technology resource sharing platforms, approved by MOST/MOF in 2019, hosted by CAS Institute of Microbiology. Covers strain resources, microbial genomics, microbiome omics, pathogenic microorganisms, and industrial microbiology.
china-cgn — 中国广核集团
SASAC-administered state-owned nuclear clean energy enterprise, one of China's two largest nuclear power operators. Listed as CGN Power (003816.SZ / 1816.HK) and CGN New Energy (1811.HK). Covers nuclear power operations, safety performance (UCF, auto-scrams, radiation dose), Hualong One construction, and renewable installations.
china-cpema — 中国医药企业管理协会
National pharmaceutical industry association publishing enterprise rankings, R&D and innovation pipeline data, market analyses, API production, and OTC/Rx market segmentation.
china-capa — 中国演出行业协会
Only national performing arts industry association, supervised by the Ministry of Culture and Tourism. Publishes box office, concert/theatre attendance, touring performance, and online performance market reports.
china-cia-cybersecurity — 中国网络安全产业联盟
National cybersecurity alliance guided by CAC and MIIT. Publishes industry reports, technical standards, threat intelligence, MLPS 2.0 (等保2.0) compliance guidance, and emerging tech white papers (zero-trust, AI security, industrial internet security).
Verification
make checkpasses (713 sources, all valid)Files
firstdata/sources/china/research/china-nmdc.jsonfirstdata/sources/china/resources/energy/china-cgn.jsonfirstdata/sources/china/health/china-cpema.jsonfirstdata/sources/china/governance/culture/china-capa.jsonfirstdata/sources/china/technology/industry_associations/china-cia-cybersecurity.json