Skip to content

Commit c6574d8

Browse files
committed
docs(research): aviation-wx strict-parsing pilot engineering report
Captures the empirically-derived schemas for /systems, /procedures, /deployments, and /datastreams under csapi-go-v2 strict parsing, the concrete refactor applied to aviation-wx, live verification results, information-loss disclosure (characteristics/capabilities), the recipe for fleet propagation to the remaining 8 publishers, and the four outstanding upstream issues to file against OS4CSAPI/connected-systems-go.
1 parent 9206be0 commit c6574d8

1 file changed

Lines changed: 238 additions & 0 deletions

File tree

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Aviation-WX Strict-Parsing Pilot — Engineering Report
2+
3+
**Date:** 2026-05-09
4+
**Author:** OS4CSAPI / OSHConnect-Python team
5+
**Branch:** `fix/aviation-wx-strict-parsing-2026-05-09`
6+
**Commits:** `ca2b794` (research §9 update), `9206be0` (publisher fix)
7+
**Target server:** `https://129-80-248-53.sslip.io/csapi-go-v2/` (`connected-systems-go` post-`a467aba0` strict parsing, `c2ab201` partial typo fix)
8+
**Companion document:** [Strict_Parsing_Migration_Spec_Grounded_Reanalysis_2026-05-09.md](Strict_Parsing_Migration_Spec_Grounded_Reanalysis_2026-05-09.md) §9
9+
10+
---
11+
12+
## 1. Summary
13+
14+
The aviation-wx publisher has been refactored to publish cleanly against
15+
the strict-parsing csapi-go-v2 server. End-to-end live run produced
16+
**zero rejections**: 1 procedure, 5 systems with full SensorML, 5
17+
datastreams, and a 3-tier deployment tree (root + group + 5 station
18+
deployments). Round-trip verification confirms keywords, identifiers,
19+
classifiers, contacts, documents, position, and links are all preserved
20+
on systems.
21+
22+
This document captures the empirically-derived schema for each resource
23+
type so the same recipe can be propagated mechanically to the remaining
24+
8 publishers (`coops`, `iss`, `ndbc`, `nws`, `opensky`, `usgs_eq`,
25+
`usgs_nims`, `usgs_water`).
26+
27+
---
28+
29+
## 2. The encoding contract (one-line statement)
30+
31+
> Strict csapi-go-v2 enforces OGC 23-001's closed `feature.properties`
32+
> schema. **All SensorML metadata must be PUT separately** as
33+
> `application/sml+json` against the resource path after the
34+
> `application/geo+json` POST creates the resource.
35+
36+
The two-step pattern (already implemented in
37+
[publishers/bootstrap_helpers.py](../../publishers/bootstrap_helpers.py)
38+
as `ensure_procedure(stub, sml_body=...)`,
39+
`ensure_system(stub, sml_body=...)`,
40+
`ensure_deployment(stub, sml_body=..., parent_id=...)`):
41+
42+
1. `POST /{collection}` with `Content-Type: application/geo+json`
43+
stub body has only the closed properties set.
44+
2. `PUT /{collection}/{id}` with `Content-Type: application/sml+json`
45+
SensorML JSON encoding (`uniqueId`, `label`, `definition`, …).
46+
47+
---
48+
49+
## 3. Empirically-derived schemas (per-resource)
50+
51+
### 3.1 `POST /systems` — GeoJSON Feature
52+
53+
| `properties` field | Strict server |
54+
| --- | --- |
55+
| `featureType`, `uid`, `name`, `description` | ✅ accepted |
56+
| `typeOf@link`, `procedure@link`, `links`, `validTime`, `keywords`, `documentation`, `contacts`, anything else |**400 unknown field** |
57+
58+
### 3.2 `PUT /systems/{id}` — SensorML JSON encoding
59+
60+
| Field | Status |
61+
| --- | --- |
62+
| `type`, `uniqueId`, `label`, `definition`, `description` ||
63+
| `keywords`, `identifiers`, `classifiers`, `contacts`, `documents` (✱) ||
64+
| `position`, `validTime`, `history`, `securityConstraints`, `legalConstraints`, `links` ||
65+
| `characteristics`, `capabilities` |**400 unknown field** — surprising; OGC 12-000r2 SensorML JSON encoding fields |
66+
| `typeOf` | ⚠️ **HTTP 500 "Failed to update system"** — server-side defect |
67+
| GeoJSON-encoding names (`uid`, `name`, `featureType`) |**400** |
68+
69+
(✱) On `/systems` use the canonical name **`documents`** — fixed by upstream commit `c2ab201`.
70+
71+
### 3.3 `POST /procedures` — GeoJSON Feature
72+
73+
| `properties` field | Strict server |
74+
| --- | --- |
75+
| `featureType`, `uid`, `name`, `description`, **`validTime`** | ✅ accepted (note: `validTime` IS allowed here) |
76+
| `keywords`, `links`, anything else |**400 unknown field** |
77+
78+
### 3.4 `PUT /procedures/{id}` — SensorML JSON encoding
79+
80+
| Field | Status |
81+
| --- | --- |
82+
| `type`, `uniqueId`, `label`, `definition`, `description` ||
83+
| `keywords`, `identifiers`, `contacts`, `validTime` ||
84+
| **`documentation`** (typo) | ✅ accepted |
85+
| **`documents`** (canonical) |**400 unknown field** |
86+
87+
> **⚠ Asymmetry vs. `/systems`:** the `c2ab201` upstream fix landed
88+
> only on `SystemSensorMLFeature`, not on `ProcedureSensorMLFeature`.
89+
> Until the follow-up commit lands, the procedure SML PUT requires
90+
> the typo'd field name `documentation`. Track this as upstream issue
91+
> follow-up to OS4CSAPI/connected-systems-go #10.
92+
93+
### 3.5 `POST /deployments` — GeoJSON Feature
94+
95+
| `properties` field | Strict server |
96+
| --- | --- |
97+
| `featureType`, `uid`, `name`, `description`, **`validTime`**, **`platform@link`** ||
98+
| `documentation`, `parent@link`, `links`, anything else |**400 unknown field** |
99+
100+
`platform@link` accepts `{href, uid, title}`. Sub-deployments use
101+
`POST /deployments/{parent_id}/subdeployments`.
102+
103+
### 3.6 `POST /systems/{system_id}/datastreams` — Part 2 schema body
104+
105+
| Body field | Status |
106+
| --- | --- |
107+
| `name`, `description`, `outputName`, `phenomenonTime`, `observedProperties`, `formats` ||
108+
| `schema.obsFormat` = `application/om+json` ||
109+
| `schema.resultSchema.type` = `DataRecord` ||
110+
| `schema.resultSchema.fields[].uom` (Time + Quantity) ||
111+
| `documentation`, `characteristics` |**400 unknown field** |
112+
| Time field `referenceTime` |**400 unknown field in schema.resultSchema.fields[N]** |
113+
| Datastream `uid` (top-level) | ❌ rejected — server assigns its own |
114+
115+
---
116+
117+
## 4. Concrete changes applied to aviation-wx
118+
119+
| File / location | Before | After |
120+
| --- | --- | --- |
121+
| `PROCEDURE_BODY` (single dict) | SensorML metadata (`keywords`, `documentation`, `contacts`, `lineage`, `usageConstraints`) inside `properties` | Split into `PROCEDURE_BODY_STUB` (closed properties + `validTime`) and `PROCEDURE_SML` (full SensorML JSON, uses `documentation` typo per §3.4) |
122+
| `_system_stub()` | `properties.{typeOf@link, links, validTime}` | Closed properties only — typeOf/links/validTime moved into `_system_sml()` (where supported) |
123+
| `_system_sml()` | Included `characteristics` + `capabilities` arrays | Dropped both (server rejects); equivalent info preserved via `identifiers`, `classifiers`, `position`, `documents` |
124+
| `_datastream_schema()` | `documentation`, `characteristics`, top-level `uid`, Time field `referenceTime` | All removed; only server-accepted fields retained |
125+
| `_deploy_root() / _deploy_group()` | `documentation` array | Removed; closed properties + `validTime` only |
126+
| `_deploy_station()` | `links` array | Removed; `platform@link` retained |
127+
| `bootstrap()` proc call | `ensure_procedure(..., PROCEDURE_BODY)` | `ensure_procedure(..., PROCEDURE_BODY_STUB, sml_body=PROCEDURE_SML, force_sml=force_sml)` |
128+
129+
---
130+
131+
## 5. Live verification
132+
133+
Command: `python -m publishers.aviation_wx.bootstrap_aviation_wx --clean`
134+
against `BOOTSTRAP_URL=https://129-80-248-53.sslip.io/csapi-go-v2`.
135+
136+
```
137+
── Procedures ──
138+
[OK] Created procedure urn:os4csapi:procedure:metar-decoder:v1 → id=765585ac…
139+
── Systems + Datastreams ──
140+
[OK] Created system urn:os4csapi:system:awx:ktus:v1 → id=d84d684b…
141+
[OK] Created datastream 'metarObs' → id=b82bc9d4…
142+
[… × 5 stations …]
143+
── Deployments ──
144+
[OK] Created deployment urn:os4csapi:deployment:awx-metar-demo:v1 → id=084e5584…
145+
[OK] Created deployment urn:os4csapi:deployment:awx-stations:v1 → id=ce8b5807…
146+
[OK] Created deployment urn:os4csapi:deployment:awx-ktus:v1 → id=b3bcc331…
147+
[… × 5 stations …]
148+
```
149+
150+
Round-trip GET `application/sml+json` for KDMA returned full SensorML
151+
with `keywords`, `identifiers`, `classifiers`, `contacts`, `documents`,
152+
`position`, `links`, `validTime`, `definition`, `uniqueId`, `label`,
153+
`description` all populated as published.
154+
155+
---
156+
157+
## 6. Information loss disclosure
158+
159+
Two SensorML structures previously published cannot currently round-trip:
160+
161+
1. **`characteristics`** (operator, station type, FAA identifier,
162+
field elevation as a grouped SWE DataRecord). Equivalent atoms are
163+
preserved via `identifiers` (Short/Long Name, ICAO ID),
164+
`classifiers` (Sensor Type, Intended Application), and `position`
165+
(geodetic). Field elevation is currently lost from SML; `description`
166+
text retains it as prose.
167+
2. **`capabilities`** (publisher publish-interval, data-source).
168+
Equivalent provenance is preserved in `documents` and in the
169+
procedure SML's `documentation` array.
170+
171+
Both losses are server-side limitations of csapi-go-v2 (see §3.2); when
172+
upstream restores `characteristics`/`capabilities` we can re-enable
173+
those blocks unchanged.
174+
175+
---
176+
177+
## 7. Recipe for fleet propagation
178+
179+
Apply per-publisher in order (same pattern, mechanical):
180+
181+
1. **Identify GeoJSON stubs** for procedures, systems, deployments.
182+
Anything in `properties` outside §3.1 / §3.3 / §3.5 must move out.
183+
2. **Build companion SML bodies** using OGC SensorML JSON encoding:
184+
`uniqueId` / `label` / `definition` / `description` /
185+
`keywords` / `identifiers` / `classifiers` / `contacts` /
186+
`documents` (or `documentation` for procedures, see §3.4) /
187+
`position` / `links`.
188+
3. **Wire `sml_body=` argument** on `ensure_procedure` /
189+
`ensure_system` / `ensure_deployment` calls. Pass
190+
`force_sml=force_sml` so the `--force-sml` CLI flag works.
191+
4. **Strip datastream schemas** of `documentation`, `characteristics`,
192+
top-level `uid`, and Time field `referenceTime`.
193+
5. **Validate**: dry-run first
194+
(`OS4CSAPI_STRICT_BOOTSTRAP=1 python -m publishers.<NAME>.bootstrap_<NAME> --dry-run`).
195+
Then live: `--clean` (idempotent reset), then plain
196+
bootstrap to confirm SKIP-on-second-run. Then `curl … -H 'Accept: application/sml+json'` round-trip on at least one resource per type.
197+
6. **Commit per-publisher** with a message like
198+
`fix(<publisher>): split GeoJSON stubs from SensorML bodies for strict csapi-go-v2`.
199+
200+
The `OS4CSAPI_STRICT_BOOTSTRAP=1` guardrail in
201+
[bootstrap_helpers.py](../../publishers/bootstrap_helpers.py)
202+
(`_warn_if_sml_fields_in_stub`) raises `RuntimeError` on any leaked
203+
SML field — recommended for the dry-run.
204+
205+
---
206+
207+
## 8. Outstanding upstream issues (file separately)
208+
209+
1. **OS4CSAPI/connected-systems-go #10 follow-up**: replicate the
210+
`c2ab201` `documents`/`documentation` rename onto
211+
`ProcedureSensorMLFeature` (see §3.4).
212+
2. **`ProcedureSensorMLFeature` HTTP 500 path**: any unknown SML field
213+
surfaces as `{"error":"Failed to update procedure"}` HTTP 500 instead
214+
of a clean 400 with a field name. (Compare clean 400 on `/systems`
215+
PUT.) Defensive parsing or a clearer error path is warranted.
216+
3. **`SystemSensorMLFeature.characteristics` / `.capabilities` rejection**:
217+
these fields are first-class in OGC 12-000r2 SensorML JSON encoding
218+
and OGC 23-001 references the SML schema by reference. Either the
219+
server should accept them or document that it does not. (See §3.2.)
220+
4. **`SystemSensorMLFeature.typeOf` HTTP 500**: same defensive-parsing
221+
point as #2 — should be 400 with a field-name error or it should
222+
simply work. (See §3.2.)
223+
224+
---
225+
226+
## 9. Next actions
227+
228+
- [x] aviation-wx: refactored, dry-run + live verified, committed.
229+
- [ ] coops: apply recipe.
230+
- [ ] iss: apply recipe.
231+
- [ ] ndbc: apply recipe.
232+
- [ ] nws: apply recipe.
233+
- [ ] opensky: apply recipe.
234+
- [ ] usgs_eq: apply recipe.
235+
- [ ] usgs_nims: apply recipe.
236+
- [ ] usgs_water: apply recipe.
237+
- [ ] After all 9 land, open PR vs. `main`, link this report and the §9 reanalysis.
238+
- [ ] File the four upstream issues from §8 against `OS4CSAPI/connected-systems-go`.

0 commit comments

Comments
 (0)