|
| 1 | +# Aviation-WX Strict-Parsing Pilot — Engineering Report |
| 2 | + |
| 3 | +**Date:** 2026-05-09 |
| 4 | +**Author:** OS4CSAPI / OSHConnect-Python team |
| 5 | +**Branch:** `fix/aviation-wx-strict-parsing-2026-05-09` |
| 6 | +**Commits:** `ca2b794` (research §9 update), `9206be0` (publisher fix) |
| 7 | +**Target server:** `https://129-80-248-53.sslip.io/csapi-go-v2/` (`connected-systems-go` post-`a467aba0` strict parsing, `c2ab201` partial typo fix) |
| 8 | +**Companion document:** [Strict_Parsing_Migration_Spec_Grounded_Reanalysis_2026-05-09.md](Strict_Parsing_Migration_Spec_Grounded_Reanalysis_2026-05-09.md) §9 |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## 1. Summary |
| 13 | + |
| 14 | +The aviation-wx publisher has been refactored to publish cleanly against |
| 15 | +the strict-parsing csapi-go-v2 server. End-to-end live run produced |
| 16 | +**zero rejections**: 1 procedure, 5 systems with full SensorML, 5 |
| 17 | +datastreams, and a 3-tier deployment tree (root + group + 5 station |
| 18 | +deployments). Round-trip verification confirms keywords, identifiers, |
| 19 | +classifiers, contacts, documents, position, and links are all preserved |
| 20 | +on systems. |
| 21 | + |
| 22 | +This document captures the empirically-derived schema for each resource |
| 23 | +type so the same recipe can be propagated mechanically to the remaining |
| 24 | +8 publishers (`coops`, `iss`, `ndbc`, `nws`, `opensky`, `usgs_eq`, |
| 25 | +`usgs_nims`, `usgs_water`). |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## 2. The encoding contract (one-line statement) |
| 30 | + |
| 31 | +> Strict csapi-go-v2 enforces OGC 23-001's closed `feature.properties` |
| 32 | +> schema. **All SensorML metadata must be PUT separately** as |
| 33 | +> `application/sml+json` against the resource path after the |
| 34 | +> `application/geo+json` POST creates the resource. |
| 35 | +
|
| 36 | +The two-step pattern (already implemented in |
| 37 | +[publishers/bootstrap_helpers.py](../../publishers/bootstrap_helpers.py) |
| 38 | +as `ensure_procedure(stub, sml_body=...)`, |
| 39 | +`ensure_system(stub, sml_body=...)`, |
| 40 | +`ensure_deployment(stub, sml_body=..., parent_id=...)`): |
| 41 | + |
| 42 | +1. `POST /{collection}` with `Content-Type: application/geo+json` — |
| 43 | + stub body has only the closed properties set. |
| 44 | +2. `PUT /{collection}/{id}` with `Content-Type: application/sml+json` — |
| 45 | + SensorML JSON encoding (`uniqueId`, `label`, `definition`, …). |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## 3. Empirically-derived schemas (per-resource) |
| 50 | + |
| 51 | +### 3.1 `POST /systems` — GeoJSON Feature |
| 52 | + |
| 53 | +| `properties` field | Strict server | |
| 54 | +| --- | --- | |
| 55 | +| `featureType`, `uid`, `name`, `description` | ✅ accepted | |
| 56 | +| `typeOf@link`, `procedure@link`, `links`, `validTime`, `keywords`, `documentation`, `contacts`, anything else | ❌ **400 unknown field** | |
| 57 | + |
| 58 | +### 3.2 `PUT /systems/{id}` — SensorML JSON encoding |
| 59 | + |
| 60 | +| Field | Status | |
| 61 | +| --- | --- | |
| 62 | +| `type`, `uniqueId`, `label`, `definition`, `description` | ✅ | |
| 63 | +| `keywords`, `identifiers`, `classifiers`, `contacts`, `documents` (✱) | ✅ | |
| 64 | +| `position`, `validTime`, `history`, `securityConstraints`, `legalConstraints`, `links` | ✅ | |
| 65 | +| `characteristics`, `capabilities` | ❌ **400 unknown field** — surprising; OGC 12-000r2 SensorML JSON encoding fields | |
| 66 | +| `typeOf` | ⚠️ **HTTP 500 "Failed to update system"** — server-side defect | |
| 67 | +| GeoJSON-encoding names (`uid`, `name`, `featureType`) | ❌ **400** | |
| 68 | + |
| 69 | +(✱) On `/systems` use the canonical name **`documents`** — fixed by upstream commit `c2ab201`. |
| 70 | + |
| 71 | +### 3.3 `POST /procedures` — GeoJSON Feature |
| 72 | + |
| 73 | +| `properties` field | Strict server | |
| 74 | +| --- | --- | |
| 75 | +| `featureType`, `uid`, `name`, `description`, **`validTime`** | ✅ accepted (note: `validTime` IS allowed here) | |
| 76 | +| `keywords`, `links`, anything else | ❌ **400 unknown field** | |
| 77 | + |
| 78 | +### 3.4 `PUT /procedures/{id}` — SensorML JSON encoding |
| 79 | + |
| 80 | +| Field | Status | |
| 81 | +| --- | --- | |
| 82 | +| `type`, `uniqueId`, `label`, `definition`, `description` | ✅ | |
| 83 | +| `keywords`, `identifiers`, `contacts`, `validTime` | ✅ | |
| 84 | +| **`documentation`** (typo) | ✅ accepted | |
| 85 | +| **`documents`** (canonical) | ❌ **400 unknown field** | |
| 86 | + |
| 87 | +> **⚠ Asymmetry vs. `/systems`:** the `c2ab201` upstream fix landed |
| 88 | +> only on `SystemSensorMLFeature`, not on `ProcedureSensorMLFeature`. |
| 89 | +> Until the follow-up commit lands, the procedure SML PUT requires |
| 90 | +> the typo'd field name `documentation`. Track this as upstream issue |
| 91 | +> follow-up to OS4CSAPI/connected-systems-go #10. |
| 92 | +
|
| 93 | +### 3.5 `POST /deployments` — GeoJSON Feature |
| 94 | + |
| 95 | +| `properties` field | Strict server | |
| 96 | +| --- | --- | |
| 97 | +| `featureType`, `uid`, `name`, `description`, **`validTime`**, **`platform@link`** | ✅ | |
| 98 | +| `documentation`, `parent@link`, `links`, anything else | ❌ **400 unknown field** | |
| 99 | + |
| 100 | +`platform@link` accepts `{href, uid, title}`. Sub-deployments use |
| 101 | +`POST /deployments/{parent_id}/subdeployments`. |
| 102 | + |
| 103 | +### 3.6 `POST /systems/{system_id}/datastreams` — Part 2 schema body |
| 104 | + |
| 105 | +| Body field | Status | |
| 106 | +| --- | --- | |
| 107 | +| `name`, `description`, `outputName`, `phenomenonTime`, `observedProperties`, `formats` | ✅ | |
| 108 | +| `schema.obsFormat` = `application/om+json` | ✅ | |
| 109 | +| `schema.resultSchema.type` = `DataRecord` | ✅ | |
| 110 | +| `schema.resultSchema.fields[].uom` (Time + Quantity) | ✅ | |
| 111 | +| `documentation`, `characteristics` | ❌ **400 unknown field** | |
| 112 | +| Time field `referenceTime` | ❌ **400 unknown field in schema.resultSchema.fields[N]** | |
| 113 | +| Datastream `uid` (top-level) | ❌ rejected — server assigns its own | |
| 114 | + |
| 115 | +--- |
| 116 | + |
| 117 | +## 4. Concrete changes applied to aviation-wx |
| 118 | + |
| 119 | +| File / location | Before | After | |
| 120 | +| --- | --- | --- | |
| 121 | +| `PROCEDURE_BODY` (single dict) | SensorML metadata (`keywords`, `documentation`, `contacts`, `lineage`, `usageConstraints`) inside `properties` | Split into `PROCEDURE_BODY_STUB` (closed properties + `validTime`) and `PROCEDURE_SML` (full SensorML JSON, uses `documentation` typo per §3.4) | |
| 122 | +| `_system_stub()` | `properties.{typeOf@link, links, validTime}` | Closed properties only — typeOf/links/validTime moved into `_system_sml()` (where supported) | |
| 123 | +| `_system_sml()` | Included `characteristics` + `capabilities` arrays | Dropped both (server rejects); equivalent info preserved via `identifiers`, `classifiers`, `position`, `documents` | |
| 124 | +| `_datastream_schema()` | `documentation`, `characteristics`, top-level `uid`, Time field `referenceTime` | All removed; only server-accepted fields retained | |
| 125 | +| `_deploy_root() / _deploy_group()` | `documentation` array | Removed; closed properties + `validTime` only | |
| 126 | +| `_deploy_station()` | `links` array | Removed; `platform@link` retained | |
| 127 | +| `bootstrap()` proc call | `ensure_procedure(..., PROCEDURE_BODY)` | `ensure_procedure(..., PROCEDURE_BODY_STUB, sml_body=PROCEDURE_SML, force_sml=force_sml)` | |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## 5. Live verification |
| 132 | + |
| 133 | +Command: `python -m publishers.aviation_wx.bootstrap_aviation_wx --clean` |
| 134 | +against `BOOTSTRAP_URL=https://129-80-248-53.sslip.io/csapi-go-v2`. |
| 135 | + |
| 136 | +``` |
| 137 | +── Procedures ── |
| 138 | + [OK] Created procedure urn:os4csapi:procedure:metar-decoder:v1 → id=765585ac… |
| 139 | +── Systems + Datastreams ── |
| 140 | + [OK] Created system urn:os4csapi:system:awx:ktus:v1 → id=d84d684b… |
| 141 | + [OK] Created datastream 'metarObs' → id=b82bc9d4… |
| 142 | + [… × 5 stations …] |
| 143 | +── Deployments ── |
| 144 | + [OK] Created deployment urn:os4csapi:deployment:awx-metar-demo:v1 → id=084e5584… |
| 145 | + [OK] Created deployment urn:os4csapi:deployment:awx-stations:v1 → id=ce8b5807… |
| 146 | + [OK] Created deployment urn:os4csapi:deployment:awx-ktus:v1 → id=b3bcc331… |
| 147 | + [… × 5 stations …] |
| 148 | +``` |
| 149 | + |
| 150 | +Round-trip GET `application/sml+json` for KDMA returned full SensorML |
| 151 | +with `keywords`, `identifiers`, `classifiers`, `contacts`, `documents`, |
| 152 | +`position`, `links`, `validTime`, `definition`, `uniqueId`, `label`, |
| 153 | +`description` all populated as published. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## 6. Information loss disclosure |
| 158 | + |
| 159 | +Two SensorML structures previously published cannot currently round-trip: |
| 160 | + |
| 161 | +1. **`characteristics`** (operator, station type, FAA identifier, |
| 162 | + field elevation as a grouped SWE DataRecord). Equivalent atoms are |
| 163 | + preserved via `identifiers` (Short/Long Name, ICAO ID), |
| 164 | + `classifiers` (Sensor Type, Intended Application), and `position` |
| 165 | + (geodetic). Field elevation is currently lost from SML; `description` |
| 166 | + text retains it as prose. |
| 167 | +2. **`capabilities`** (publisher publish-interval, data-source). |
| 168 | + Equivalent provenance is preserved in `documents` and in the |
| 169 | + procedure SML's `documentation` array. |
| 170 | + |
| 171 | +Both losses are server-side limitations of csapi-go-v2 (see §3.2); when |
| 172 | +upstream restores `characteristics`/`capabilities` we can re-enable |
| 173 | +those blocks unchanged. |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## 7. Recipe for fleet propagation |
| 178 | + |
| 179 | +Apply per-publisher in order (same pattern, mechanical): |
| 180 | + |
| 181 | +1. **Identify GeoJSON stubs** for procedures, systems, deployments. |
| 182 | + Anything in `properties` outside §3.1 / §3.3 / §3.5 must move out. |
| 183 | +2. **Build companion SML bodies** using OGC SensorML JSON encoding: |
| 184 | + `uniqueId` / `label` / `definition` / `description` / |
| 185 | + `keywords` / `identifiers` / `classifiers` / `contacts` / |
| 186 | + `documents` (or `documentation` for procedures, see §3.4) / |
| 187 | + `position` / `links`. |
| 188 | +3. **Wire `sml_body=` argument** on `ensure_procedure` / |
| 189 | + `ensure_system` / `ensure_deployment` calls. Pass |
| 190 | + `force_sml=force_sml` so the `--force-sml` CLI flag works. |
| 191 | +4. **Strip datastream schemas** of `documentation`, `characteristics`, |
| 192 | + top-level `uid`, and Time field `referenceTime`. |
| 193 | +5. **Validate**: dry-run first |
| 194 | + (`OS4CSAPI_STRICT_BOOTSTRAP=1 python -m publishers.<NAME>.bootstrap_<NAME> --dry-run`). |
| 195 | + Then live: `--clean` (idempotent reset), then plain |
| 196 | + bootstrap to confirm SKIP-on-second-run. Then `curl … -H 'Accept: application/sml+json'` round-trip on at least one resource per type. |
| 197 | +6. **Commit per-publisher** with a message like |
| 198 | + `fix(<publisher>): split GeoJSON stubs from SensorML bodies for strict csapi-go-v2`. |
| 199 | + |
| 200 | +The `OS4CSAPI_STRICT_BOOTSTRAP=1` guardrail in |
| 201 | +[bootstrap_helpers.py](../../publishers/bootstrap_helpers.py) |
| 202 | +(`_warn_if_sml_fields_in_stub`) raises `RuntimeError` on any leaked |
| 203 | +SML field — recommended for the dry-run. |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## 8. Outstanding upstream issues (file separately) |
| 208 | + |
| 209 | +1. **OS4CSAPI/connected-systems-go #10 follow-up**: replicate the |
| 210 | + `c2ab201` `documents`/`documentation` rename onto |
| 211 | + `ProcedureSensorMLFeature` (see §3.4). |
| 212 | +2. **`ProcedureSensorMLFeature` HTTP 500 path**: any unknown SML field |
| 213 | + surfaces as `{"error":"Failed to update procedure"}` HTTP 500 instead |
| 214 | + of a clean 400 with a field name. (Compare clean 400 on `/systems` |
| 215 | + PUT.) Defensive parsing or a clearer error path is warranted. |
| 216 | +3. **`SystemSensorMLFeature.characteristics` / `.capabilities` rejection**: |
| 217 | + these fields are first-class in OGC 12-000r2 SensorML JSON encoding |
| 218 | + and OGC 23-001 references the SML schema by reference. Either the |
| 219 | + server should accept them or document that it does not. (See §3.2.) |
| 220 | +4. **`SystemSensorMLFeature.typeOf` HTTP 500**: same defensive-parsing |
| 221 | + point as #2 — should be 400 with a field-name error or it should |
| 222 | + simply work. (See §3.2.) |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +## 9. Next actions |
| 227 | + |
| 228 | +- [x] aviation-wx: refactored, dry-run + live verified, committed. |
| 229 | +- [ ] coops: apply recipe. |
| 230 | +- [ ] iss: apply recipe. |
| 231 | +- [ ] ndbc: apply recipe. |
| 232 | +- [ ] nws: apply recipe. |
| 233 | +- [ ] opensky: apply recipe. |
| 234 | +- [ ] usgs_eq: apply recipe. |
| 235 | +- [ ] usgs_nims: apply recipe. |
| 236 | +- [ ] usgs_water: apply recipe. |
| 237 | +- [ ] After all 9 land, open PR vs. `main`, link this report and the §9 reanalysis. |
| 238 | +- [ ] File the four upstream issues from §8 against `OS4CSAPI/connected-systems-go`. |
0 commit comments