stacknil · stacknil · May 23, 2026 · May 23, 2026
diff --git a/.github/workflows/python-weather-diagnostics-toolkit-ci.yml b/.github/workflows/python-weather-diagnostics-toolkit-ci.yml
@@ -53,4 +53,6 @@ jobs:
         run: |
           python scripts/run_thermodynamic_check.py --help
           python scripts/run_dynamics_summary.py --help
+          python scripts/run_precipitation_workflow.py --help
+          python scripts/run_climate_statistics.py --help
           python scripts/run_synthetic_ensemble.py --help
diff --git a/projects/python-weather-diagnostics-toolkit/README.md b/projects/python-weather-diagnostics-toolkit/README.md
@@ -20,7 +20,9 @@ artifacts. It focuses on:
 - 2 m temperature, 10 m wind, 500 hPa height, and 850 hPa wind/temperature fields
 - Magnus-formula dewpoint diagnostics and round-trip humidity checks
 - geopotential-height conversion
-- relative-vorticity and horizontal-advection diagnostics
+- relative-vorticity, horizontal-advection, and moisture-flux diagnostics
+- station-to-grid interpolation and precipitation accumulation conversion
+- anomaly, composite, and grid-point correlation helpers
 - cosine-latitude regional means
 - a deterministic time-ordered ridge-regression baseline for 24-hour temperature prediction
 - synthetic ensemble summaries for Nino-style forecast-plume interpretation
@@ -34,6 +36,8 @@ python-weather-diagnostics-toolkit/
 |   +-- data-policy.md
 |   +-- calculation-methods.md
 |   +-- diagnostic-analysis.md
+|   +-- station-precipitation-workflows.md
+|   +-- climate-statistical-diagnostics.md
 |   +-- methodology.md
 |   +-- reproducibility.md
 |   +-- reviewer-path.md
@@ -43,6 +47,8 @@ python-weather-diagnostics-toolkit/
 |   +-- synthetic-weather-diagnostics-report.md
 +-- scripts/
 |   +-- run_dynamics_summary.py
+|   +-- run_precipitation_workflow.py
+|   +-- run_climate_statistics.py
 |   +-- run_synthetic_ensemble.py
 |   +-- run_thermodynamic_check.py
 +-- src/python_weather_diagnostics_toolkit/
@@ -78,6 +84,8 @@ Inspect the public CLI surfaces:
 ```bash
 python scripts/run_thermodynamic_check.py --help
 python scripts/run_dynamics_summary.py --help
+python scripts/run_precipitation_workflow.py --help
+python scripts/run_climate_statistics.py --help
 python scripts/run_synthetic_ensemble.py --help
 ```
 
@@ -104,11 +112,20 @@ Dynamic layer:
 - estimates latitude/longitude grid spacing from spherical Earth geometry
 - computes relative vorticity as `dv/dx - du/dy`
 - computes horizontal scalar advection as `-(u dS/dx + v dS/dy)`
+- computes moisture flux divergence as `d(q u)/dx + d(q v)/dy`
 - keeps finite-difference assumptions explicit for reviewer inspection
 
+Station and precipitation layer:
+
+- replaces sentinel-coded missing values with `NaN`
+- interpolates station values to a target grid with inverse-distance weighting
+- converts accumulated precipitation into per-step amounts and rates
+- summarizes event totals and threshold exceedance masks
+
 Statistical layer:
 
 - reduces gridded fields to cosine-latitude area means
+- computes anomalies, standardized anomalies, composites, and grid-point correlations
 - constructs time-ordered forecast tables from regional features
 - fits a deterministic ridge-regression baseline without random shuffling
 - reports RMSE, MAE, bias, and correlation as workflow diagnostics
@@ -150,8 +167,9 @@ For real analysis, users provide their own local ERA5-style NetCDF files through
 `configs/example.yaml`. The toolkit expects common variables such as:
 
 - single-level fields: `t2m`, `d2m`, `u10`, `v10`, `tp`, or their long ERA5 names
-- pressure-level fields: `t`, `u`, `v`, `z`, `r`, `w`, `vo`, or their long ERA5 names
+- pressure-level fields: `t`, `u`, `v`, `z`, `r`, `q`, `w`, `vo`, or their long ERA5 names
 - coordinates: `time` or `valid_time`, `latitude`, `longitude`, and optionally `pressure_level`
+- optional station tables supplied locally with longitude, latitude, and value columns
 
 ## Generated Outputs
 
@@ -188,7 +206,9 @@ The more detailed technical route is:
 
 1. [`docs/calculation-methods.md`](docs/calculation-methods.md)
 2. [`docs/diagnostic-analysis.md`](docs/diagnostic-analysis.md)
-3. [`docs/source-to-public-mapping.md`](docs/source-to-public-mapping.md)
+3. [`docs/station-precipitation-workflows.md`](docs/station-precipitation-workflows.md)
+4. [`docs/climate-statistical-diagnostics.md`](docs/climate-statistical-diagnostics.md)
+5. [`docs/source-to-public-mapping.md`](docs/source-to-public-mapping.md)
 
 ## Privacy-Safe Scope
 

diff --git a/projects/python-weather-diagnostics-toolkit/SANITIZATION_REPORT.md b/projects/python-weather-diagnostics-toolkit/SANITIZATION_REPORT.md
@@ -51,6 +51,10 @@ The public project preserves the reusable calculation ideas:
 - geopotential-height conversion
 - relative-vorticity calculation
 - horizontal temperature-advection diagnostics
+- moisture flux divergence diagnostics
+- station missing-value handling and lightweight station-to-grid interpolation
+- accumulated precipitation conversion and event-total summaries
+- anomaly, standardized-anomaly, composite, and correlation-field diagnostics
 - cosine-latitude regional means
 - time-ordered ridge-regression baseline evaluation
 - synthetic ensemble summary mechanics

diff --git a/projects/python-weather-diagnostics-toolkit/configs/example.yaml b/projects/python-weather-diagnostics-toolkit/configs/example.yaml
@@ -22,6 +22,10 @@ diagnostics:
   geopotential_height_500hpa: true
   vorticity_500hpa: true
   temperature_advection_850hpa: true
+  moisture_flux_divergence_850hpa: true
+  station_interpolation: false
+  precipitation_event_total: false
+  climate_statistics: false
   ridge_temperature_baseline: true
 
 baseline_model:

diff --git a/projects/python-weather-diagnostics-toolkit/docs/calculation-methods.md b/projects/python-weather-diagnostics-toolkit/docs/calculation-methods.md
@@ -135,6 +135,21 @@ warming tendency. In a full thermodynamic budget, this is only one term. The
 public mini-lab intentionally keeps the default implementation to horizontal
 advection so that tests remain small and dependency-light.
 
+## Moisture Flux Divergence
+
+For lower-tropospheric precipitation diagnostics, the toolkit includes a
+horizontal specific-humidity flux divergence:
+
+```text
+div(qV) = d(q u)/dx + d(q v)/dy
+```
+
+where `q` is specific humidity, `u` is zonal wind, and `v` is meridional wind.
+Negative values are often read as moisture-flux convergence under the chosen
+coordinate and unit assumptions. This diagnostic should be interpreted with
+precipitation, vertical motion, and synoptic context rather than as a complete
+rainfall budget.
+
 ## Area-Weighted Regional Mean
 
 Regional features use cosine-latitude weighting:
@@ -230,3 +245,45 @@ Example synthetic summary rows:
 
 These values are synthetic. They are useful for verifying table generation and
 reviewer interpretation, not for climate diagnosis.
+
+## Station And Precipitation Utilities
+
+The precipitation helpers cover common data-preparation steps:
+
+```text
+missing sentinel -> NaN
+accumulated precipitation -> per-step amount
+per-step amount -> mm/day-equivalent rate
+event total -> sum over finite event samples
+threshold exceedance -> finite value >= configured threshold
+```
+
+Accumulated precipitation is required to be non-decreasing along the selected
+lead axis. A decreasing finite sequence raises an error because it usually
+indicates mixed forecast cycles, an incorrect lead dimension, or an unhandled
+product reset.
+
+Station-to-grid examples use inverse-distance weighting:
+
+```text
+weight = 1 / distance**power
+grid_value = sum(weight * station_value) / sum(weight)
+```
+
+This is a transparent interpolation baseline, not a claim that IDW is optimal
+for every terrain, network density, or precipitation regime.
+
+## Climate Statistics
+
+Climate-statistics helpers include anomalies, standardized anomalies,
+composites, and grid-point correlations:
+
+```text
+anomaly = value - climatology_mean
+standardized = anomaly / climatology_std
+composite = mean(field[event_mask])
+r = cov(index, field_point) / (std(index) * std(field_point))
+```
+
+Zero-spread baselines, too-small samples, and zero-variance correlation points
+return `NaN` so review surfaces show where the statistic is undefined.
diff --git a/...ects/python-weather-diagnostics-toolkit/docs/climate-statistical-diagnostics.md b/...ects/python-weather-diagnostics-toolkit/docs/climate-statistical-diagnostics.md
@@ -0,0 +1,76 @@
+# Climate Statistical Diagnostics
+
+The source materials included basic climate-statistics workflows: anomalies,
+correlation, regression, composites, and simple machine-learning baselines. The
+public toolkit now exposes a small set of deterministic helpers for these
+tasks, while keeping datasets and claims outside the repository.
+
+## Anomalies
+
+An anomaly is a departure from a reference baseline:
+
+```text
+anomaly = value - climatology_mean
+```
+
+A standardized anomaly divides that departure by a reference spread:
+
+```text
+standardized_anomaly = (value - climatology_mean) / climatology_std
+```
+
+If the baseline spread is zero, the public implementation returns `NaN` rather
+than an infinite value. That makes degenerate baselines visible during review.
+
+## Composite Means
+
+Composite analysis averages samples selected by an event mask:
+
+```text
+composite = mean(field[event_mask])
+```
+
+The event mask should be defined before looking at the composite field. Useful
+examples include warm-event days, heavy-precipitation days, or high-index
+periods. Public examples should document:
+
+- how the event mask was defined
+- the number of selected samples
+- whether the composite is a mean field, difference field, or anomaly field
+- whether statistical significance was assessed separately
+
+## Correlation Fields
+
+The helper `pearson_correlation_field` computes the Pearson correlation between
+a one-dimensional index and every grid point in a field:
+
+```text
+r = cov(index, field_point) / (std(index) * std(field_point))
+```
+
+The implementation handles missing values pairwise and returns `NaN` for grid
+points with too few finite pairs or zero variance.
+
+## Regression And Prediction Boundaries
+
+The toolkit already includes a ridge-regression baseline for time-ordered
+temperature prediction. The climate-statistics helpers are meant to support
+feature exploration before modeling:
+
+```text
+anomaly map -> regional feature -> time-ordered split -> transparent baseline
+```
+
+They do not establish forecast skill on their own. A public result should
+include an explicit validation period, comparison baseline, sampling design,
+and error metric before making predictive claims.
+
+## Review Checklist
+
+For any climate-statistics result, verify:
+
+- the baseline period is stated
+- missing-value handling is documented
+- the sample axis is time ordered when used for prediction
+- event definitions are chosen before composite interpretation
+- correlation or regression output is not described as causation
diff --git a/projects/python-weather-diagnostics-toolkit/docs/diagnostic-analysis.md b/projects/python-weather-diagnostics-toolkit/docs/diagnostic-analysis.md
@@ -109,6 +109,28 @@ fluxes, and analysis increments. The public project keeps the default
 calculation narrow so it remains reproducible and testable without heavy
 external data.
 
+## Moisture Transport And Heavy Precipitation
+
+For heavy-precipitation case studies, the toolkit supports horizontal moisture
+flux divergence:
+
+```text
+d(q u)/dx + d(q v)/dy
+```
+
+Interpretation pattern:
+
+```text
+moisture convergence + sustained lift + favorable circulation -> plausible rainfall support
+```
+
+Care points:
+
+- moisture convergence is not rainfall by itself
+- vertically integrated transport may be more appropriate than one pressure level
+- precipitation totals should be checked against observation or reanalysis products
+- terrain, convection, and microphysics are outside this compact diagnostic
+
 ## Regional Temperature Baseline
 
 The baseline model is intentionally simple:
@@ -159,6 +181,41 @@ In the deterministic synthetic example, the ensemble starts warm, becomes
 mixed near lead month 12, and shifts cold by lead month 24. This is an example
 of interpreting an artificial plume, not a statement about the real ocean.
 
+## Station, Precipitation, And Extremes
+
+Station observations and gridded precipitation require an explicit quality
+control chain before interpretation:
+
+```text
+missing sentinel -> finite-value mask -> interpolation or event total -> threshold check
+```
+
+Threshold exceedance should be described according to the threshold source:
+
+- absolute threshold: value meets a fixed user-supplied amount
+- percentile threshold: value exceeds a local historical percentile
+- standardized anomaly: value exceeds a baseline-relative spread multiple
+
+The public toolkit provides mechanics for these checks. It does not define
+official warnings or redistribute station records.
+
+## Climate Statistics
+
+Anomaly, composite, and correlation outputs are exploratory diagnostics.
+
+Interpretation pattern:
+
+```text
+well-defined baseline + transparent event mask + sufficient samples -> interpretable statistic
+```
+
+Care points:
+
+- correlation is not causation
+- composite masks should be selected before interpreting the resulting field
+- undefined baselines or zero-variance points should remain visible as `NaN`
+- prediction workflows should keep time-ordered validation
+
 ## Reviewer Questions
 
 Useful reviewer questions:

diff --git a/projects/python-weather-diagnostics-toolkit/docs/methodology.md b/projects/python-weather-diagnostics-toolkit/docs/methodology.md
@@ -68,7 +68,35 @@ table can include:
 - current 2 m temperature
 - future 2 m temperature target shifted by a configured lead
 
-## 5. Baseline Prediction
+## 5. Station And Precipitation Preparation
+
+Station and precipitation workflows start with quality control:
+
+```text
+sentinel-coded missing values -> NaN
+station observations -> finite station rows -> IDW grid
+forecast accumulations -> per-step precipitation -> rate or event total
+```
+
+The public implementation uses small NumPy helpers rather than provider-specific
+download code. This keeps the method reviewable while leaving data access,
+licensing, and provenance to the user.
+
+## 6. Climate Statistics
+
+Climate diagnostics use explicit baselines and event masks:
+
+```text
+anomaly = value - climatology_mean
+standardized anomaly = anomaly / climatology_std
+composite = mean(selected event samples)
+correlation field = corr(index, grid point)
+```
+
+Undefined or under-sampled statistics return `NaN` instead of a misleading
+number.
+
+## 7. Baseline Prediction
 
 The included baseline is a transparent ridge regression:
 
@@ -83,7 +111,7 @@ the result reproducible. Metrics include RMSE, MAE, bias, and correlation.
 The baseline is included for workflow demonstration only. It is not a claim of
 forecast skill.
 
-## 6. Synthetic Ensemble Summary
+## 8. Synthetic Ensemble Summary
 
 The synthetic Nino-style ensemble utility creates deterministic plume data with
 a fixed random seed. It demonstrates:
@@ -96,7 +124,7 @@ a fixed random seed. It demonstrates:
 The generated values are synthetic and should be read only as an example of
 summary mechanics.
 
-## 7. Interpretation Boundaries
+## 9. Interpretation Boundaries
 
 A public interpretation should say what was computed and what the diagnostic
 suggests, while avoiding unsupported claims. For example: