Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/python-weather-diagnostics-toolkit-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,6 @@ jobs:
run: |
python scripts/run_thermodynamic_check.py --help
python scripts/run_dynamics_summary.py --help
python scripts/run_precipitation_workflow.py --help
python scripts/run_climate_statistics.py --help
python scripts/run_synthetic_ensemble.py --help
26 changes: 23 additions & 3 deletions projects/python-weather-diagnostics-toolkit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ artifacts. It focuses on:
- 2 m temperature, 10 m wind, 500 hPa height, and 850 hPa wind/temperature fields
- Magnus-formula dewpoint diagnostics and round-trip humidity checks
- geopotential-height conversion
- relative-vorticity and horizontal-advection diagnostics
- relative-vorticity, horizontal-advection, and moisture-flux diagnostics
- station-to-grid interpolation and precipitation accumulation conversion
- anomaly, composite, and grid-point correlation helpers
- cosine-latitude regional means
- a deterministic time-ordered ridge-regression baseline for 24-hour temperature prediction
- synthetic ensemble summaries for Nino-style forecast-plume interpretation
Expand All @@ -34,6 +36,8 @@ python-weather-diagnostics-toolkit/
| +-- data-policy.md
| +-- calculation-methods.md
| +-- diagnostic-analysis.md
| +-- station-precipitation-workflows.md
| +-- climate-statistical-diagnostics.md
| +-- methodology.md
| +-- reproducibility.md
| +-- reviewer-path.md
Expand All @@ -43,6 +47,8 @@ python-weather-diagnostics-toolkit/
| +-- synthetic-weather-diagnostics-report.md
+-- scripts/
| +-- run_dynamics_summary.py
| +-- run_precipitation_workflow.py
| +-- run_climate_statistics.py
| +-- run_synthetic_ensemble.py
| +-- run_thermodynamic_check.py
+-- src/python_weather_diagnostics_toolkit/
Expand Down Expand Up @@ -78,6 +84,8 @@ Inspect the public CLI surfaces:
```bash
python scripts/run_thermodynamic_check.py --help
python scripts/run_dynamics_summary.py --help
python scripts/run_precipitation_workflow.py --help
python scripts/run_climate_statistics.py --help
python scripts/run_synthetic_ensemble.py --help
```

Expand All @@ -104,11 +112,20 @@ Dynamic layer:
- estimates latitude/longitude grid spacing from spherical Earth geometry
- computes relative vorticity as `dv/dx - du/dy`
- computes horizontal scalar advection as `-(u dS/dx + v dS/dy)`
- computes moisture flux divergence as `d(q u)/dx + d(q v)/dy`
- keeps finite-difference assumptions explicit for reviewer inspection

Station and precipitation layer:

- replaces sentinel-coded missing values with `NaN`
- interpolates station values to a target grid with inverse-distance weighting
- converts accumulated precipitation into per-step amounts and rates
- summarizes event totals and threshold exceedance masks

Statistical layer:

- reduces gridded fields to cosine-latitude area means
- computes anomalies, standardized anomalies, composites, and grid-point correlations
- constructs time-ordered forecast tables from regional features
- fits a deterministic ridge-regression baseline without random shuffling
- reports RMSE, MAE, bias, and correlation as workflow diagnostics
Expand Down Expand Up @@ -150,8 +167,9 @@ For real analysis, users provide their own local ERA5-style NetCDF files through
`configs/example.yaml`. The toolkit expects common variables such as:

- single-level fields: `t2m`, `d2m`, `u10`, `v10`, `tp`, or their long ERA5 names
- pressure-level fields: `t`, `u`, `v`, `z`, `r`, `w`, `vo`, or their long ERA5 names
- pressure-level fields: `t`, `u`, `v`, `z`, `r`, `q`, `w`, `vo`, or their long ERA5 names
- coordinates: `time` or `valid_time`, `latitude`, `longitude`, and optionally `pressure_level`
- optional station tables supplied locally with longitude, latitude, and value columns

## Generated Outputs

Expand Down Expand Up @@ -188,7 +206,9 @@ The more detailed technical route is:

1. [`docs/calculation-methods.md`](docs/calculation-methods.md)
2. [`docs/diagnostic-analysis.md`](docs/diagnostic-analysis.md)
3. [`docs/source-to-public-mapping.md`](docs/source-to-public-mapping.md)
3. [`docs/station-precipitation-workflows.md`](docs/station-precipitation-workflows.md)
4. [`docs/climate-statistical-diagnostics.md`](docs/climate-statistical-diagnostics.md)
5. [`docs/source-to-public-mapping.md`](docs/source-to-public-mapping.md)

## Privacy-Safe Scope

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ The public project preserves the reusable calculation ideas:
- geopotential-height conversion
- relative-vorticity calculation
- horizontal temperature-advection diagnostics
- moisture flux divergence diagnostics
- station missing-value handling and lightweight station-to-grid interpolation
- accumulated precipitation conversion and event-total summaries
- anomaly, standardized-anomaly, composite, and correlation-field diagnostics
- cosine-latitude regional means
- time-ordered ridge-regression baseline evaluation
- synthetic ensemble summary mechanics
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ diagnostics:
geopotential_height_500hpa: true
vorticity_500hpa: true
temperature_advection_850hpa: true
moisture_flux_divergence_850hpa: true
station_interpolation: false
precipitation_event_total: false
climate_statistics: false
ridge_temperature_baseline: true

baseline_model:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,21 @@ warming tendency. In a full thermodynamic budget, this is only one term. The
public mini-lab intentionally keeps the default implementation to horizontal
advection so that tests remain small and dependency-light.

## Moisture Flux Divergence

For lower-tropospheric precipitation diagnostics, the toolkit includes a
horizontal specific-humidity flux divergence:

```text
div(qV) = d(q u)/dx + d(q v)/dy
```

where `q` is specific humidity, `u` is zonal wind, and `v` is meridional wind.
Negative values are often read as moisture-flux convergence under the chosen
coordinate and unit assumptions. This diagnostic should be interpreted with
precipitation, vertical motion, and synoptic context rather than as a complete
rainfall budget.

## Area-Weighted Regional Mean

Regional features use cosine-latitude weighting:
Expand Down Expand Up @@ -230,3 +245,45 @@ Example synthetic summary rows:

These values are synthetic. They are useful for verifying table generation and
reviewer interpretation, not for climate diagnosis.

## Station And Precipitation Utilities

The precipitation helpers cover common data-preparation steps:

```text
missing sentinel -> NaN
accumulated precipitation -> per-step amount
per-step amount -> mm/day-equivalent rate
event total -> sum over finite event samples
threshold exceedance -> finite value >= configured threshold
```

Accumulated precipitation is required to be non-decreasing along the selected
lead axis. A decreasing finite sequence raises an error because it usually
indicates mixed forecast cycles, an incorrect lead dimension, or an unhandled
product reset.

Station-to-grid examples use inverse-distance weighting:

```text
weight = 1 / distance**power
grid_value = sum(weight * station_value) / sum(weight)
```

This is a transparent interpolation baseline, not a claim that IDW is optimal
for every terrain, network density, or precipitation regime.

## Climate Statistics

Climate-statistics helpers include anomalies, standardized anomalies,
composites, and grid-point correlations:

```text
anomaly = value - climatology_mean
standardized = anomaly / climatology_std
composite = mean(field[event_mask])
r = cov(index, field_point) / (std(index) * std(field_point))
```

Zero-spread baselines, too-small samples, and zero-variance correlation points
return `NaN` so review surfaces show where the statistic is undefined.
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Climate Statistical Diagnostics

The source materials included basic climate-statistics workflows: anomalies,
correlation, regression, composites, and simple machine-learning baselines. The
public toolkit now exposes a small set of deterministic helpers for these
tasks, while keeping datasets and claims outside the repository.

## Anomalies

An anomaly is a departure from a reference baseline:

```text
anomaly = value - climatology_mean
```

A standardized anomaly divides that departure by a reference spread:

```text
standardized_anomaly = (value - climatology_mean) / climatology_std
```

If the baseline spread is zero, the public implementation returns `NaN` rather
than an infinite value. That makes degenerate baselines visible during review.

## Composite Means

Composite analysis averages samples selected by an event mask:

```text
composite = mean(field[event_mask])
```

The event mask should be defined before looking at the composite field. Useful
examples include warm-event days, heavy-precipitation days, or high-index
periods. Public examples should document:

- how the event mask was defined
- the number of selected samples
- whether the composite is a mean field, difference field, or anomaly field
- whether statistical significance was assessed separately

## Correlation Fields

The helper `pearson_correlation_field` computes the Pearson correlation between
a one-dimensional index and every grid point in a field:

```text
r = cov(index, field_point) / (std(index) * std(field_point))
```

The implementation handles missing values pairwise and returns `NaN` for grid
points with too few finite pairs or zero variance.

## Regression And Prediction Boundaries

The toolkit already includes a ridge-regression baseline for time-ordered
temperature prediction. The climate-statistics helpers are meant to support
feature exploration before modeling:

```text
anomaly map -> regional feature -> time-ordered split -> transparent baseline
```

They do not establish forecast skill on their own. A public result should
include an explicit validation period, comparison baseline, sampling design,
and error metric before making predictive claims.

## Review Checklist

For any climate-statistics result, verify:

- the baseline period is stated
- missing-value handling is documented
- the sample axis is time ordered when used for prediction
- event definitions are chosen before composite interpretation
- correlation or regression output is not described as causation
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,28 @@ fluxes, and analysis increments. The public project keeps the default
calculation narrow so it remains reproducible and testable without heavy
external data.

## Moisture Transport And Heavy Precipitation

For heavy-precipitation case studies, the toolkit supports horizontal moisture
flux divergence:

```text
d(q u)/dx + d(q v)/dy
```

Interpretation pattern:

```text
moisture convergence + sustained lift + favorable circulation -> plausible rainfall support
```

Care points:

- moisture convergence is not rainfall by itself
- vertically integrated transport may be more appropriate than one pressure level
- precipitation totals should be checked against observation or reanalysis products
- terrain, convection, and microphysics are outside this compact diagnostic

## Regional Temperature Baseline

The baseline model is intentionally simple:
Expand Down Expand Up @@ -159,6 +181,41 @@ In the deterministic synthetic example, the ensemble starts warm, becomes
mixed near lead month 12, and shifts cold by lead month 24. This is an example
of interpreting an artificial plume, not a statement about the real ocean.

## Station, Precipitation, And Extremes

Station observations and gridded precipitation require an explicit quality
control chain before interpretation:

```text
missing sentinel -> finite-value mask -> interpolation or event total -> threshold check
```

Threshold exceedance should be described according to the threshold source:

- absolute threshold: value meets a fixed user-supplied amount
- percentile threshold: value exceeds a local historical percentile
- standardized anomaly: value exceeds a baseline-relative spread multiple

The public toolkit provides mechanics for these checks. It does not define
official warnings or redistribute station records.

## Climate Statistics

Anomaly, composite, and correlation outputs are exploratory diagnostics.

Interpretation pattern:

```text
well-defined baseline + transparent event mask + sufficient samples -> interpretable statistic
```

Care points:

- correlation is not causation
- composite masks should be selected before interpreting the resulting field
- undefined baselines or zero-variance points should remain visible as `NaN`
- prediction workflows should keep time-ordered validation

## Reviewer Questions

Useful reviewer questions:
Expand Down
34 changes: 31 additions & 3 deletions projects/python-weather-diagnostics-toolkit/docs/methodology.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,35 @@ table can include:
- current 2 m temperature
- future 2 m temperature target shifted by a configured lead

## 5. Baseline Prediction
## 5. Station And Precipitation Preparation

Station and precipitation workflows start with quality control:

```text
sentinel-coded missing values -> NaN
station observations -> finite station rows -> IDW grid
forecast accumulations -> per-step precipitation -> rate or event total
```

The public implementation uses small NumPy helpers rather than provider-specific
download code. This keeps the method reviewable while leaving data access,
licensing, and provenance to the user.

## 6. Climate Statistics

Climate diagnostics use explicit baselines and event masks:

```text
anomaly = value - climatology_mean
standardized anomaly = anomaly / climatology_std
composite = mean(selected event samples)
correlation field = corr(index, grid point)
```

Undefined or under-sampled statistics return `NaN` instead of a misleading
number.

## 7. Baseline Prediction

The included baseline is a transparent ridge regression:

Expand All @@ -83,7 +111,7 @@ the result reproducible. Metrics include RMSE, MAE, bias, and correlation.
The baseline is included for workflow demonstration only. It is not a claim of
forecast skill.

## 6. Synthetic Ensemble Summary
## 8. Synthetic Ensemble Summary

The synthetic Nino-style ensemble utility creates deterministic plume data with
a fixed random seed. It demonstrates:
Expand All @@ -96,7 +124,7 @@ a fixed random seed. It demonstrates:
The generated values are synthetic and should be read only as an example of
summary mechanics.

## 7. Interpretation Boundaries
## 9. Interpretation Boundaries

A public interpretation should say what was computed and what the diagnostic
suggests, while avoiding unsupported claims. For example:
Expand Down
Loading