|
| 1 | +# Phase 5 — Multi-Server Publisher Plan |
| 2 | + |
| 3 | +**Status:** Proposed |
| 4 | +**Date:** 2026-05-09 |
| 5 | +**Author:** OS4CSAPI / Sam Bolling |
| 6 | +**Predecessors:** Phase 1 (Bootstrap), Phase 2 (Datastreams + ControlStreams), Phase 3 (Simulator route redesign), Phase 4 (NDJSON Replay Engine) |
| 7 | +**Successors:** TBD — to be defined at Phase 5 close |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## 1. Objective |
| 12 | + |
| 13 | +Make the OSHConnect-Python publisher fleet usable against **any** CSAPI server we encounter — not just OpenSensorHub. Today the fleet hard-codes OSH conventions (Basic auth, GeoJSON `Feature` envelope, OM-JSON observations, `controlstreams` lowercase path, etc.) and partially adapts to the Go server through ad-hoc patches scattered across `publishers/bootstrap_helpers.py`. With the Phase 9 deployment of the live 52°North `connected-systems-pygeoapi` server, we now have **three** distinct CSAPI implementations the fleet should be able to publish to, plus the existing OSH and Go targets. This phase replaces ad-hoc per-server patching with a documented, profile-driven abstraction. |
| 14 | + |
| 15 | +This is *not* an attempt to upstream changes to the original `Botts-Innovative-Research/OSHConnect-Python` library. As of Phase 5 the OS4CSAPI fork is a standalone project: a CSAPI client *library* plus a CSAPI *publisher fleet* that we own end-to-end. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## 2. Background and Provenance |
| 20 | + |
| 21 | +This plan synthesizes findings from: |
| 22 | + |
| 23 | +- **OSHConnect-Python issue #5** (open, P1): `ensure_procedure` / `ensure_deployment` silently lose all SensorML metadata; POSTs use the wrong content-type and payload shape. |
| 24 | +- **OSHConnect-Python issue #4** (open): bootstrap idempotency `find_by_uid` reads only the first page; same single-page pattern repeated in `find_datastream` and `_discover_system_ds`; `limit=1000` is a fragile workaround. |
| 25 | +- **`docs/research/Silent_SensorML_Field_Loss_Engineering_Report_2026-05-06.md`** — the engineering report behind #5. |
| 26 | +- **`docs/research/CSAPI_Go_Server_Integration_Report_2026-04-17.md`** — the per-server-quirk catalog accumulated during the Go server migration. |
| 27 | +- **`docs/research/Publisher_Fleet_Portability_Plan.md`** — earlier planning that named portability as a goal but pre-dated the Go and pygeoapi servers. |
| 28 | +- **External:** `ogc-client-CSAPI_2/docs/research/phase-9/03-52north-pygeoapi-deployment-findings.md` — Phase 9 deployment of `52North/connected-systems-pygeoapi` on Oracle Cloud, including the documented publisher-integration blocker (§8 of that report). |
| 29 | +- **External:** `ogc-csapi-explorer/docs/governance/known-server-quirks.md` — the authoritative three-server quirks matrix (OSH, csa.demo.52north.org, pygeoapi-live), validated via the explorer's CRUD Smoke Test. |
| 30 | + |
| 31 | +The Phase 9 deployment doc explicitly recommended: |
| 32 | + |
| 33 | +> *"Instead of adapting the Go publisher, write a thin Python publisher that consumes the same OS4CSAPI event stream and emits pygeoapi-shaped payloads directly. The seeder already implements ~80% of that translation."* |
| 34 | +
|
| 35 | +Phase 5 makes that "thin Python publisher" a first-class capability of *this* repo rather than a separate one-off. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## 3. Target Server Matrix |
| 40 | + |
| 41 | +| # | Server | Base URL | Auth | Status | |
| 42 | +| - | ----------------------------- | --------------------------------------------------------- | ------ | ------------- | |
| 43 | +| 1 | OpenSensorHub (OSH) | `http://45.55.99.236:8080/sensorhub/api` | Basic | Production target — current default | |
| 44 | +| 2 | OS4CSAPI Go server | `https://129-80-248-53.sslip.io/csapi-go` | None | Production target — partially supported via in-line patches | |
| 45 | +| 3 | 52°North pygeoapi (live) | `https://129-80-248-53.sslip.io/csapi-pygeoapi` | None | New in Phase 9 — **not yet a publisher target** (blocker documented) | |
| 46 | +| 4 | csa.demo.52north.org (public) | `https://csa.demo.52north.org` | None | Read-only target — used for content-negotiation regression testing only | |
| 47 | + |
| 48 | +Phase 5 success criteria are defined against servers 1, 2, and 3. |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## 4. Problem Statement (per-server divergence the fleet currently can't handle) |
| 53 | + |
| 54 | +Each row is a real, captured observation from the explorer's CRUD Smoke Test or from `seed_pygeoapi.py`: |
| 55 | + |
| 56 | +| Concern | OSH | Go server | pygeoapi (Phase 9) | |
| 57 | +| ------------------------ | ----------------------------------------------------------- | ----------------------------------------------- | --------------------------------------------------------------------------------- | |
| 58 | +| `POST /systems` shape | CSAPI GeoJSON `Feature` (`type:"Feature"`, `properties{}`) | CSAPI GeoJSON `Feature` | **Stripped JSON; no `Feature` envelope** — `Feature` triggers `AttrDict.get` crash | |
| 59 | +| `POST /procedures` shape | CSAPI GeoJSON `Feature` | CSAPI GeoJSON `Feature` | **SensorML JSON only** — `{ type, id, definition, ... }` | |
| 60 | +| `POST /deployments` shape| CSAPI GeoJSON `Feature` | CSAPI GeoJSON `Feature` | **SensorML JSON only**, and `deployedSystems` field causes server `KeyError` | |
| 61 | +| `POST /samplingFeatures` | accepted | accepted | **405 Method Not Allowed** (read-only on this build) | |
| 62 | +| `controlstreams` path | **lowercase only** (`/controlstreams`) | camelCase (`/controlStreams`) | absent (not implemented) | |
| 63 | +| `commands` endpoint | only via `/controlstreams/{id}/commands` | top-level `/commands` | absent | |
| 64 | +| Pagination | `limit=1000` workaround currently masks `next`-link bugs | same | `next` link present, must be followed | |
| 65 | +| Auth | Basic auth required | none | none | |
| 66 | +| `Accept` header behavior | **ignored** — must use `?f=` query parameter | honored | honored, but `Accept: application/json` returns CSAPI envelope; smljson/geojson route through alternate stores | |
| 67 | +| Conformance advertised | 20+ CSAPI classes | partial CSAPI classes | only `ogcapi-common-1/1.0/conf/core` — no CSAPI classes despite working endpoints | |
| 68 | +| SensorML round-trip | accepts smljson on PUT but lossy on POST (issue #5 root) | unverified | required form for POSTs to systems/procedures/deployments | |
| 69 | + |
| 70 | +The publisher fleet currently encodes a single, OSH-shaped path through this matrix. Every server quirk the team has hit so far has resulted in either an in-line `if`-branch in `bootstrap_helpers.py` or a workaround like `limit=1000`. This does not scale to a third target and is the structural reason Phase 9's publisher integration was abandoned. |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## 5. Proposed Architecture |
| 75 | + |
| 76 | +### 5.1 ServerProfile (new module) |
| 77 | + |
| 78 | +A `ServerProfile` is a versioned, declarative description of *one* CSAPI server's quirks. The publisher fleet consumes a profile through the existing `Node` abstraction; there are no per-publisher conditionals. |
| 79 | + |
| 80 | +**Proposed location:** `src/oshconnect/profiles/`. |
| 81 | + |
| 82 | +**Proposed shape (sketch, not final API):** |
| 83 | + |
| 84 | +```python |
| 85 | +@dataclass(frozen=True) |
| 86 | +class ServerProfile: |
| 87 | + name: str # "osh" | "csapi-go" | "pygeoapi-live" | ... |
| 88 | + base_url_pattern: str # for matching/auto-detection |
| 89 | + auth: AuthStrategy # BasicAuth | NoAuth | BearerToken | ApiKey |
| 90 | + endpoints: EndpointMap # canonical kind -> path (handles /controlstreams casing) |
| 91 | + content_negotiation: ContentNegotiation # Accept-honored vs ?f= override |
| 92 | + payload_shapes: PayloadShapeMap # per-resource POST/PUT body shape |
| 93 | + pagination: PaginationStrategy # next-link walker | limit=1000 | offset |
| 94 | + conformance_required: list[str] # used to fail-fast on unfit targets |
| 95 | + known_quirks: list[str] # human-readable; logged on connect |
| 96 | +``` |
| 97 | + |
| 98 | +Three profiles ship with Phase 5: `osh`, `csapi-go`, `pygeoapi-live`. They are loaded by name from a YAML registry (stretch goal: share the registry with `ogc-csapi-explorer/docs/governance/known-server-quirks.md` so quirks are single-sourced). |
| 99 | + |
| 100 | +### 5.2 Node opens with a profile and probes /conformance |
| 101 | + |
| 102 | +On `Node.connect()` the client: |
| 103 | + |
| 104 | +1. GETs `/` and `/conformance`. |
| 105 | +2. Compares the advertised conformance classes to the profile's `conformance_required`. |
| 106 | +3. Logs a single "server profile loaded" line listing the active quirks. |
| 107 | +4. If a required class is absent, raises `ConformanceError` *before* the first publish — Phase 9's pygeoapi blocker would have surfaced here on connect rather than after four failed payload-rewrite attempts. |
| 108 | + |
| 109 | +### 5.3 PayloadShape adapters |
| 110 | + |
| 111 | +Three concrete adapters cover today's matrix: |
| 112 | + |
| 113 | +- `CSAPIFeatureShape` — current OSH/Go default. |
| 114 | +- `SensorMLJSONShape` — pygeoapi's required form for `/systems`, `/procedures`, `/deployments`. |
| 115 | +- `StrippedJSONShape` — pygeoapi's `/systems` workaround for the `AttrDict` crash. |
| 116 | + |
| 117 | +`bootstrap_helpers.py` calls `node.profile.shape_for("procedures").build(model)` instead of inlining the body. The existing SensorML round-trip work on the `fix/sml-content-type-and-shape` branch becomes the Pydantic v2 *source* model that the adapters serialize from — fixing #5 by construction rather than by patching the existing code path. |
| 118 | + |
| 119 | +### 5.4 Pagination iterator |
| 120 | + |
| 121 | +A single `paginate(node, url, params=None) -> Iterator[T]` replaces every `limit=1000` call site. It honors `links: rel=next` when present and falls back to offset-paging when absent. Closes #4 across `find_by_uid`, `find_datastream`, `_discover_system_ds`, and the `bootstrap_helpers` siblings. |
| 122 | + |
| 123 | +### 5.5 HTTP resilience layer |
| 124 | + |
| 125 | +`HTTPHelper` is wrapped (or replaced) with a layer that adds: |
| 126 | + |
| 127 | +- per-request timeout (configurable, default 30s). |
| 128 | +- `tenacity`-style retry on 429/503/connection errors with exponential backoff. |
| 129 | +- bounded concurrency (Phase 4 replay engine already needs this; today it's ad-hoc). |
| 130 | +- a typed exception hierarchy: `CSAPIError → ServerProfileError | ConformanceError | PayloadShapeError | RateLimitError | TransportError`. |
| 131 | + |
| 132 | +### 5.6 Full PUT and DELETE coverage |
| 133 | + |
| 134 | +Phase 5 closes the CRUD matrix for the resource types the fleet actively uses. Per the OS4CSAPI library audit (`ogc-client-CSAPI_2/docs/research/requirements/csapi-oshconnect-python-analysis.md` §3.3) the original library implements only CREATE+READ for most resources. Reconciliation (delete-and-republish) is currently impossible without falling back to raw HTTP, and that's the actual workflow when a publisher's source data corrects itself. |
| 135 | + |
| 136 | +### 5.7 Smoke-test parity with the explorer |
| 137 | + |
| 138 | +Add `python -m oshconnect.smoke_test --profile <name>` that runs the same CRUD matrix the explorer's Smoke Test page runs. Reuses `ServerProfile` and `PayloadShape`, and emits a result table identical in structure to the explorer's, so a publisher engineer and a UI engineer are reading the same dashboard when they ask "is this server fit to publish to?" |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## 6. Scope |
| 143 | + |
| 144 | +### In scope |
| 145 | + |
| 146 | +- New `oshconnect.profiles` module with `osh`, `csapi-go`, `pygeoapi-live` profiles. |
| 147 | +- Conformance probe on connect. |
| 148 | +- PayloadShape adapters covering the three target servers. |
| 149 | +- Pagination iterator (closes #4). |
| 150 | +- SensorML Pydantic v2 models + round-trip POST (closes #5, supersedes the in-flight `fix/sml-content-type-and-shape` branch). |
| 151 | +- HTTP resilience layer with typed exceptions. |
| 152 | +- PUT and DELETE for `systems`, `procedures`, `deployments`, `datastreams`, `controlstreams`, `samplingFeatures` — guarded by the active profile (not all servers support all paths). |
| 153 | +- `oshconnect.smoke_test` CLI parity with the explorer. |
| 154 | +- One existing publisher (`USGS_Water` is the smallest with full bootstrap coverage) ported end-to-end onto the profile abstraction as the reference port. |
| 155 | + |
| 156 | +### Out of scope |
| 157 | + |
| 158 | +- Reworking the Phase 4 NDJSON Replay Engine — it consumes the new resilience layer, but its architecture is unchanged. |
| 159 | +- Streaming (WebSocket / MQTT) auth strategies — Basic only is fine for Phase 5; bearer/OAuth is a Phase 6 concern. |
| 160 | +- Upstreaming any of this to `Botts-Innovative-Research/OSHConnect-Python`. The fork is a standalone project. |
| 161 | +- A pygeoapi-side fix for the `AttrDict` crash, the `samplingFeatures` 405, or the missing CSAPI conformance classes — those are upstream-server bugs, tracked in `52North/connected-systems-pygeoapi` and the Phase 9 deployment doc. |
| 162 | +- Sharing the YAML quirks registry with the explorer in this phase — listed as a stretch goal in §5.1. |
| 163 | + |
| 164 | +--- |
| 165 | + |
| 166 | +## 7. Deliverables |
| 167 | + |
| 168 | +1. `src/oshconnect/profiles/{__init__.py, base.py, osh.py, csapi_go.py, pygeoapi_live.py, registry.yaml}`. |
| 169 | +2. `src/oshconnect/payload_shapes/{__init__.py, csapi_feature.py, sensorml_json.py, stripped_json.py}`. |
| 170 | +3. `src/oshconnect/pagination.py` — single `paginate()` iterator. |
| 171 | +4. `src/oshconnect/http/{client.py, retry.py, exceptions.py}` — replaces / wraps `HTTPHelper`. |
| 172 | +5. `src/oshconnect/sensorml/` — Pydantic v2 models for SystemSML, ProcedureSML, DeploymentSML; lossless round-trip tests against the seed data captured in `ogc-client-CSAPI_2/docs/research/phase-9/captures/oracle-pygeoapi/`. |
| 173 | +6. `src/oshconnect/smoke_test.py` — CLI runner. |
| 174 | +7. `publishers/usgs_water/` — ported as the reference profile-driven publisher; existing OSH-targeted behavior preserved by selecting `--profile osh`. |
| 175 | +8. Tests: |
| 176 | + - Unit tests for each profile + payload shape (using `respx` against recorded responses). |
| 177 | + - Integration tests against OSH, Go, and pygeoapi-live (gated by `OSHCONNECT_LIVE=1`). |
| 178 | + - Round-trip SensorML fidelity tests fed by `Phase 9 captures/oracle-pygeoapi/`. |
| 179 | +9. `docs/research/Phase5_Results_Report.md` — published at phase close, mirroring the format of `Phase1_Bootstrap_Results.md` and `Phase4_Replay_Engine_Results.md`. |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## 8. Verification Matrix (acceptance criteria) |
| 184 | + |
| 185 | +| # | Criterion | How verified | |
| 186 | +| - | ---------------------------------------------------------------------------------------------------- | ------------ | |
| 187 | +| 1 | `Node.connect(profile="pygeoapi-live")` succeeds and logs the active quirks | Live integration test | |
| 188 | +| 2 | `usgs_water` publisher runs end-to-end against OSH, Go server, and pygeoapi-live without code change | Three live runs, captures committed | |
| 189 | +| 3 | `ensure_procedure` / `ensure_deployment` round-trip SensorML metadata losslessly (closes #5) | Round-trip test against seed captures | |
| 190 | +| 4 | All `find_by_uid`, `find_datastream`, `_discover_system_ds` call sites use `paginate()` (closes #4) | `grep` audit + unit test on multi-page server stub | |
| 191 | +| 5 | A publisher run that loses connectivity for 30s recovers without manual intervention | Toxiproxy / fault-injection test | |
| 192 | +| 6 | `python -m oshconnect.smoke_test --profile pygeoapi-live` produces a results table comparable to the explorer's CRUD Smoke Test | Side-by-side capture | |
| 193 | +| 7 | PUT and DELETE coverage exists for the six target resource types on at least one profile | Unit tests + one live profile per resource | |
| 194 | +| 8 | Removing the OSH `limit=1000` workarounds does not regress bootstrap idempotency | Existing bootstrap idempotency test passes | |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## 9. Risks and Mitigations |
| 199 | + |
| 200 | +| Risk | Mitigation | |
| 201 | +| --------------------------------------------------------------------------------- | ---------- | |
| 202 | +| pygeoapi-live's quirks shift between releases (e.g. `samplingFeatures` regains POST) | Profiles are versioned; `Node.connect` re-probes `/conformance` on every run and warns on drift | |
| 203 | +| Profile abstraction balloons into "yet another framework" | Hard cap: 3 profiles ship in Phase 5; no plugin loader, no DSL, no decorators. YAML registry only | |
| 204 | +| Resilience layer hides real server bugs | Retries are bounded and emit a structured warning per retry; integration tests explicitly assert `RateLimitError` / `TransportError` rather than treating all errors as transient | |
| 205 | +| SensorML model layer drifts from real server payloads | Round-trip tests are seeded *from* live captures, not from hand-written fixtures | |
| 206 | +| Reference port (`usgs_water`) succeeds but other publishers reveal hidden OSH-isms | After the reference port lands, run all 9 publishers in `--profile osh` mode against staging OSH; treat any difference from baseline as a bug | |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## 10. Sequencing |
| 211 | + |
| 212 | +A suggested sequence; adjust as discovered: |
| 213 | + |
| 214 | +1. **5.1** — `ServerProfile` skeleton, three profiles, `/conformance` probe, exception hierarchy. No payload changes yet. |
| 215 | +2. **5.2** — Pagination iterator + retire `limit=1000` workarounds (closes #4 mid-phase). |
| 216 | +3. **5.3** — PayloadShape adapters + SensorML Pydantic models + round-trip tests (closes #5 by construction; supersedes `fix/sml-content-type-and-shape` branch). |
| 217 | +4. **5.4** — HTTP resilience layer + typed exceptions. |
| 218 | +5. **5.5** — PUT/DELETE coverage. |
| 219 | +6. **5.6** — `oshconnect.smoke_test` CLI. |
| 220 | +7. **5.7** — Reference port: `usgs_water` driven by profile; then full-fleet `--profile osh` regression run. |
| 221 | +8. **5.8** — First successful `usgs_water` run against `pygeoapi-live` (Phase 5 acceptance gate). |
| 222 | + |
| 223 | +--- |
| 224 | + |
| 225 | +## 11. What Phase 5 explicitly does *not* attempt |
| 226 | + |
| 227 | +- It does **not** introduce a streaming auth refactor (OAuth2 / API-Key). |
| 228 | +- It does **not** change the Phase 4 replay engine's architecture. |
| 229 | +- It does **not** try to patch the live pygeoapi server. Its 405/AttrDict/`deployedSystems`-KeyError quirks are accepted as facts of life and encoded in the `pygeoapi-live` profile. |
| 230 | +- It does **not** introduce a plugin system. Profiles are concrete classes plus a YAML file. Adding a fourth profile in a future phase is a code change. |
| 231 | +- It does **not** depend on the TypeScript explorer or the `ogc-client-CSAPI_2` library at runtime. Cross-references in §2 are *informational only*; this repo remains a standalone Python project. |
| 232 | + |
| 233 | +--- |
| 234 | + |
| 235 | +## 12. Open Questions |
| 236 | + |
| 237 | +1. Should the YAML profile registry be vendored into this repo, or pulled at install time from `ogc-csapi-explorer/docs/governance/known-server-quirks.md`? Phase 5 vendors. Phase 6 may reconsider. |
| 238 | +2. Do we want a single `Node` per profile, or should a `Node` accept profile *overrides* per call (e.g. forcing `Accept: application/sml+json` for one query)? Phase 5 ships per-Node only; per-call overrides are deferred. |
| 239 | +3. Where does the OSH-specific `?f=` query rewriter live — in the profile, or in `HTTPHelper`? Leaning toward profile, because it's a quirk fact, not a transport fact. |
| 240 | +4. The `fix/sml-content-type-and-shape` branch contains a partial fix to issue #5. Should it be merged before Phase 5 starts, or absorbed into 5.3? Recommend absorbing — the Pydantic model layer is a more durable fix than the patch series on that branch. |
0 commit comments