Fix CMIP6 experiment global attribute to match the WCRP/esgvoc CV label by rhaegar325 · Pull Request #414 · ACCESS-NRI/ACCESS-MOPPy

rhaegar325 · 2026-06-02T06:12:16Z

Summary

Every CMORised CMIP6 file written the global experiment attribute from the
legacy bundled CMIP6_CVs JSON, whose experiment field is a long
descriptive phrase (e.g. "all-forcing simulation of the recent past"). The
WCRP compliance checker (wcrp_cmip6:1.0, cc-plugin-wcrp + esgvoc) validates
that attribute against esgvoc's CMIP6 controlled vocabulary, whose label is
the short canonical name (e.g. "Historical simulation"). The two vocabularies
disagree, so the checker raised a MED-priority [ATTR007] cross-attribute
consistency failure on every affected file.

This PR resolves the experiment label from esgvoc — the same source the
checker uses — so the written value matches by construction, with a safe
fallback to the legacy value.

Problem

The bundled legacy CV and esgvoc's CMIP6 universe carry different values in the
experiment field:

Source	`experiment` for `historical`
Bundled `CMIP6_CVs/CMIP6_experiment_id.json` (what we wrote)	`all-forcing simulation of the recent past`
esgvoc `cmip6` CV (what the checker validates against)	`Historical simulation`

The checker, checks/consistency_checks/check_experiment_consistency.py
([ATTR007]), does roughly:

reference_term = voc.get_term_in_collection("cmip6", "experiment_id", experiment_id)
expected_experiment = getattr(reference_term, "experiment", None)
if expected_experiment and actual_experiment is not None:
    if actual_experiment != str(expected_experiment).strip():
        failures.append(f"Inconsistency for 'experiment': CV expects "
                        f"'{expected_experiment}', file has '{actual_experiment}'.")

Two consequences matter:

The authoritative value is esgvoc's term.experiment, not the bundled
CV's.
The comparison only runs when expected_experiment is non-empty. esgvoc
returns None for most experiments (e.g. piControl, amip), so the check
is skipped for those and the legacy value is accepted. Only experiments
where esgvoc has a non-empty label (e.g. historical, esm-hist) failed.

The attribute is written once, in CMIP6Vocabulary.get_required_global_attributes:

"experiment": self.experiment["experiment"],   # legacy CV description -> mismatch

Fix

Resolve the label from esgvoc and fall back to the bundled CV value. The write
site now calls a small helper:

"experiment": self._resolve_experiment_label(),

The esgvoc.api integration

The new helper CMIP6Vocabulary._resolve_experiment_label() is the core of this
change. Design points:

Same source as the checker. It calls
esgvoc.api.get_term_in_collection(project_id="cmip6", collection_id="experiment_id", term_id=experiment_id) and reads
term.experiment — the exact call and field [ATTR007] compares against. By
construction the written value equals what the checker expects, for any
experiment esgvoc has a label for (not special-cased for historical).
Lazy, optional import. import esgvoc.api as voc happens inside the
method, not at module top. esgvoc is a checker/pixi dependency, not a hard
runtime dependency of the core CMORiser, so importing lazily keeps it optional.
Safe fallback. Any failure — esgvoc not installed (ImportError), term not
found, or an empty label — falls back to the bundled CV value
(self.experiment["experiment"]). When esgvoc has no label the checker skips
the comparison anyway, so the legacy value remains valid.
Cached per experiment. Results are memoised in a class-level
_EXPERIMENT_LABEL_CACHE keyed by experiment_id, because esgvoc lookups
touch a local database; each experiment is resolved at most once per process.

# Canonical CMIP6-CV ``experiment`` labels resolved via esgvoc, keyed by
# experiment_id. esgvoc lookups touch a database, so resolve each
# experiment at most once per process.
_EXPERIMENT_LABEL_CACHE: Dict[str, Optional[str]] = {}

def _resolve_experiment_label(self) -> str:
    """Return the canonical ``experiment`` global-attribute value.

    The WCRP compliance checker (cc-plugin-wcrp + esgvoc) compares the
    global ``experiment`` attribute against esgvoc's CMIP6 controlled
    vocabulary, whose label (e.g. ``"Historical simulation"``) differs from
    the descriptive phrase carried in the legacy CMIP6_CVs JSON bundled with
    this package (e.g. ``"all-forcing simulation of the recent past"``).

    Resolve the label from esgvoc so the written attribute matches what the
    checker validates. Fall back to the bundled CV value when esgvoc is
    unavailable or carries no label for this experiment -- in the latter
    case the checker skips the comparison, so the legacy value is accepted.
    """
    legacy_label = self.experiment.get("experiment", "")

    eid = self.experiment_id
    if eid not in CMIP6Vocabulary._EXPERIMENT_LABEL_CACHE:
        label: Optional[str] = None
        try:
            import esgvoc.api as voc

            term = voc.get_term_in_collection(
                project_id="cmip6",
                collection_id="experiment_id",
                term_id=eid,
            )
            if term is not None:
                label = getattr(term, "experiment", None)
        except Exception:
            label = None
        CMIP6Vocabulary._EXPERIMENT_LABEL_CACHE[eid] = label

    resolved = CMIP6Vocabulary._EXPERIMENT_LABEL_CACHE[eid]
    return resolved if resolved else legacy_label

Why esgvoc instead of editing the bundled JSON

The bundled CMIP6_CVs directory is a git submodule tracking upstream
WCRP-CMIP/CMIP6_CVs. Editing CMIP6_experiment_id.json in place diverges
from upstream and can be wiped by git submodule update.
esgvoc is the checker's own vocabulary, so it stays correct as the CV evolves,
with zero maintenance of a hand-copied label table.

Scope

CMIP6 only. CMIP7Vocabulary does not emit an experiment global attribute,
so it is untouched.
Single attribute write path; general across all experiments, not special-cased
for historical.

Tests

Four unit tests in tests/unit/test_vocabulary_processors.py mock esgvoc.api
via sys.modules, so they pass with or without esgvoc installed:

test_resolve_experiment_label_uses_esgvoc — esgvoc label overrides the legacy
description.
test_resolve_experiment_label_falls_back_when_esgvoc_label_empty — empty
esgvoc label keeps the legacy value.
test_resolve_experiment_label_falls_back_when_esgvoc_missing — missing esgvoc
(ImportError) keeps the legacy value.
test_resolve_experiment_label_is_cached — the lookup runs at most once per
experiment_id.

Verification

esgvoc resolves historical -> "Historical simulation" and
esm-hist -> "ESM historical simulation"; piControl -> None (checker skips).
Live end-to-end in conda/analysis3-26.06:
CMIP6Vocabulary("Amon.tas", "historical", ...)._resolve_experiment_label()
returns "Historical simulation".
A re-CMORised tas_Amon_ACCESS-ESM1-5_historical_r1i1p1f1_gn file now carries
:experiment = "Historical simulation", and compliance-checker --test wcrp_cmip6:1.0 reports no [ATTR007] experiment inconsistency.

…orised data

codecov · 2026-06-02T06:14:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.3%. Comparing base (34d58b9) to head (e91ee68).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##            main    #414     +/-   ##
=======================================
+ Coverage   75.1%   75.3%   +0.2%     
=======================================
  Files         28      28             
  Lines       5281    5298     +17     
  Branches     973     975      +2     
=======================================
+ Hits        3966    3988     +22     
+ Misses      1091    1089      -2     
+ Partials     224     221      -3

Flag	Coverage Δ
unit	`75.3% <100.0%> (+0.2%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rbeucher · 2026-06-03T03:32:33Z

Thanks — this explains why the current WCRP checker fails, but I think this should also be raised upstream. The value ACCESS-MOPPy currently writes for historical (all-forcing simulation of the recent past) matches the official CMIP6_CVs table and published CMIP6 files, while the esgvoc-formatted CMIP6 term currently exposes experiment = Historical simulation, which cc-plugin-wcrp then treats as the expected CMIP6 global attribute. That looks like an esgvoc/cc-plugin-wcrp inconsistency, not purely an ACCESS-MOPPy bug.

I’m okay with a local workaround if we need to pass the current checker, but I’d prefer we label it as temporary checker compatibility and open an upstream issue/PR. Also, can we make the helper mirror the checker’s lookup logic, including the drs_name fallback? Otherwise mixed-case IDs such as piControl may still diverge if the checker resolves them but ACCESS-MOPPy falls back to the bundled CV.

Could we add a note/test around the exact esgvoc version/data being targeted, and open an upstream issue asking whether CMIP6 experiment should match the legacy CMIP6_CVs value or the new esgvoc label?

rhaegar325 added 2 commits June 2, 2026 16:03

import esgvoc.api to load correct experiment and experiment_id for cm…

89cd3a2

…orised data

import esgvoc.api to load correct experiment and experiment_id for cm…

e91ee68

…orised data

rhaegar325 requested a review from rbeucher June 2, 2026 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CMIP6 experiment global attribute to match the WCRP/esgvoc CV label#414

Fix CMIP6 experiment global attribute to match the WCRP/esgvoc CV label#414
rhaegar325 wants to merge 2 commits into
mainfrom
fix_experiment_format_issue

rhaegar325 commented Jun 2, 2026

Uh oh!

codecov Bot commented Jun 2, 2026

Uh oh!

rbeucher commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rhaegar325 commented Jun 2, 2026

Summary

Problem

Fix

The esgvoc.api integration

Why esgvoc instead of editing the bundled JSON

Scope

Tests

Verification

Uh oh!

codecov Bot commented Jun 2, 2026

Codecov Report

Uh oh!

rbeucher commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants