Fix CMIP6 experiment global attribute to match the WCRP/esgvoc CV label#414
Fix CMIP6 experiment global attribute to match the WCRP/esgvoc CV label#414rhaegar325 wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #414 +/- ##
=======================================
+ Coverage 75.1% 75.3% +0.2%
=======================================
Files 28 28
Lines 5281 5298 +17
Branches 973 975 +2
=======================================
+ Hits 3966 3988 +22
+ Misses 1091 1089 -2
+ Partials 224 221 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks — this explains why the current WCRP checker fails, but I think this should also be raised upstream. The value ACCESS-MOPPy currently writes for I’m okay with a local workaround if we need to pass the current checker, but I’d prefer we label it as temporary checker compatibility and open an upstream issue/PR. Also, can we make the helper mirror the checker’s lookup logic, including the Could we add a note/test around the exact esgvoc version/data being targeted, and open an upstream issue asking whether CMIP6 |
Summary
Every CMORised CMIP6 file written the global
experimentattribute from thelegacy bundled
CMIP6_CVsJSON, whoseexperimentfield is a longdescriptive phrase (e.g.
"all-forcing simulation of the recent past"). TheWCRP compliance checker (
wcrp_cmip6:1.0, cc-plugin-wcrp + esgvoc) validatesthat attribute against esgvoc's CMIP6 controlled vocabulary, whose label is
the short canonical name (e.g.
"Historical simulation"). The two vocabulariesdisagree, so the checker raised a MED-priority
[ATTR007]cross-attributeconsistency failure on every affected file.
This PR resolves the
experimentlabel from esgvoc — the same source thechecker uses — so the written value matches by construction, with a safe
fallback to the legacy value.
Problem
The bundled legacy CV and esgvoc's CMIP6 universe carry different values in the
experimentfield:experimentforhistoricalCMIP6_CVs/CMIP6_experiment_id.json(what we wrote)all-forcing simulation of the recent pastcmip6CV (what the checker validates against)Historical simulationThe checker,
checks/consistency_checks/check_experiment_consistency.py(
[ATTR007]), does roughly:Two consequences matter:
term.experiment, not the bundledCV's.
expected_experimentis non-empty. esgvocreturns
Nonefor most experiments (e.g.piControl,amip), so the checkis skipped for those and the legacy value is accepted. Only experiments
where esgvoc has a non-empty label (e.g.
historical,esm-hist) failed.The attribute is written once, in
CMIP6Vocabulary.get_required_global_attributes:Fix
Resolve the label from esgvoc and fall back to the bundled CV value. The write
site now calls a small helper:
The esgvoc.api integration
The new helper
CMIP6Vocabulary._resolve_experiment_label()is the core of thischange. Design points:
Same source as the checker. It calls
esgvoc.api.get_term_in_collection(project_id="cmip6", collection_id="experiment_id", term_id=experiment_id)and readsterm.experiment— the exact call and field[ATTR007]compares against. Byconstruction the written value equals what the checker expects, for any
experiment esgvoc has a label for (not special-cased for
historical).Lazy, optional import.
import esgvoc.api as vochappens inside themethod, not at module top. esgvoc is a checker/pixi dependency, not a hard
runtime dependency of the core CMORiser, so importing lazily keeps it optional.
Safe fallback. Any failure — esgvoc not installed (
ImportError), term notfound, or an empty label — falls back to the bundled CV value
(
self.experiment["experiment"]). When esgvoc has no label the checker skipsthe comparison anyway, so the legacy value remains valid.
Cached per experiment. Results are memoised in a class-level
_EXPERIMENT_LABEL_CACHEkeyed byexperiment_id, because esgvoc lookupstouch a local database; each experiment is resolved at most once per process.
Why esgvoc instead of editing the bundled JSON
CMIP6_CVsdirectory is a git submodule tracking upstreamWCRP-CMIP/CMIP6_CVs. EditingCMIP6_experiment_id.jsonin place divergesfrom upstream and can be wiped by
git submodule update.with zero maintenance of a hand-copied label table.
Scope
CMIP7Vocabularydoes not emit anexperimentglobal attribute,so it is untouched.
for
historical.Tests
Four unit tests in
tests/unit/test_vocabulary_processors.pymockesgvoc.apivia
sys.modules, so they pass with or without esgvoc installed:test_resolve_experiment_label_uses_esgvoc— esgvoc label overrides the legacydescription.
test_resolve_experiment_label_falls_back_when_esgvoc_label_empty— emptyesgvoc label keeps the legacy value.
test_resolve_experiment_label_falls_back_when_esgvoc_missing— missing esgvoc(
ImportError) keeps the legacy value.test_resolve_experiment_label_is_cached— the lookup runs at most once perexperiment_id.Verification
historical -> "Historical simulation"andesm-hist -> "ESM historical simulation";piControl -> None(checker skips).conda/analysis3-26.06:CMIP6Vocabulary("Amon.tas", "historical", ...)._resolve_experiment_label()returns
"Historical simulation".tas_Amon_ACCESS-ESM1-5_historical_r1i1p1f1_gnfile now carries:experiment = "Historical simulation", andcompliance-checker --test wcrp_cmip6:1.0reports no[ATTR007]experimentinconsistency.