Skip to content

Commit 74d097b

Browse files
author
Copilot
committed
docs(phase-5): plan for multi-server publisher fleet (OSH, Go, pygeoapi-live)
Adds docs/research/Phase5_Multi_Server_Publisher_Plan.md proposing a ServerProfile-driven refactor of the publisher fleet so it can target any of the three live CSAPI servers we now operate (OpenSensorHub, the OS4CSAPI Go server, and the Phase-9 52North pygeoapi deployment) without per-server inline patches. Plan covers: - ServerProfile module + /conformance probe on connect - PayloadShape adapters (CSAPI Feature, SensorML JSON, stripped JSON) - Single paginate() iterator (supersedes limit=1000 workaround, closes #4) - Pydantic v2 SensorML models with round-trip fidelity (closes #5) - HTTP resilience layer + typed exception hierarchy - PUT/DELETE coverage for the six actively-used resource types - oshconnect.smoke_test CLI for parity with the explorer's CRUD smoke test - Reference port of the usgs_water publisher onto the new abstraction Frames the OS4CSAPI fork as a standalone project; no upstream merge is intended. Cross-references the Phase-9 deployment findings doc and the explorer's known-server-quirks matrix as informational sources only.
1 parent 04b7354 commit 74d097b

1 file changed

Lines changed: 240 additions & 0 deletions

File tree

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
# Phase 5 — Multi-Server Publisher Plan
2+
3+
**Status:** Proposed
4+
**Date:** 2026-05-09
5+
**Author:** OS4CSAPI / Sam Bolling
6+
**Predecessors:** Phase 1 (Bootstrap), Phase 2 (Datastreams + ControlStreams), Phase 3 (Simulator route redesign), Phase 4 (NDJSON Replay Engine)
7+
**Successors:** TBD — to be defined at Phase 5 close
8+
9+
---
10+
11+
## 1. Objective
12+
13+
Make the OSHConnect-Python publisher fleet usable against **any** CSAPI server we encounter — not just OpenSensorHub. Today the fleet hard-codes OSH conventions (Basic auth, GeoJSON `Feature` envelope, OM-JSON observations, `controlstreams` lowercase path, etc.) and partially adapts to the Go server through ad-hoc patches scattered across `publishers/bootstrap_helpers.py`. With the Phase 9 deployment of the live 52°North `connected-systems-pygeoapi` server, we now have **three** distinct CSAPI implementations the fleet should be able to publish to, plus the existing OSH and Go targets. This phase replaces ad-hoc per-server patching with a documented, profile-driven abstraction.
14+
15+
This is *not* an attempt to upstream changes to the original `Botts-Innovative-Research/OSHConnect-Python` library. As of Phase 5 the OS4CSAPI fork is a standalone project: a CSAPI client *library* plus a CSAPI *publisher fleet* that we own end-to-end.
16+
17+
---
18+
19+
## 2. Background and Provenance
20+
21+
This plan synthesizes findings from:
22+
23+
- **OSHConnect-Python issue #5** (open, P1): `ensure_procedure` / `ensure_deployment` silently lose all SensorML metadata; POSTs use the wrong content-type and payload shape.
24+
- **OSHConnect-Python issue #4** (open): bootstrap idempotency `find_by_uid` reads only the first page; same single-page pattern repeated in `find_datastream` and `_discover_system_ds`; `limit=1000` is a fragile workaround.
25+
- **`docs/research/Silent_SensorML_Field_Loss_Engineering_Report_2026-05-06.md`** — the engineering report behind #5.
26+
- **`docs/research/CSAPI_Go_Server_Integration_Report_2026-04-17.md`** — the per-server-quirk catalog accumulated during the Go server migration.
27+
- **`docs/research/Publisher_Fleet_Portability_Plan.md`** — earlier planning that named portability as a goal but pre-dated the Go and pygeoapi servers.
28+
- **External:** `ogc-client-CSAPI_2/docs/research/phase-9/03-52north-pygeoapi-deployment-findings.md` — Phase 9 deployment of `52North/connected-systems-pygeoapi` on Oracle Cloud, including the documented publisher-integration blocker (§8 of that report).
29+
- **External:** `ogc-csapi-explorer/docs/governance/known-server-quirks.md` — the authoritative three-server quirks matrix (OSH, csa.demo.52north.org, pygeoapi-live), validated via the explorer's CRUD Smoke Test.
30+
31+
The Phase 9 deployment doc explicitly recommended:
32+
33+
> *"Instead of adapting the Go publisher, write a thin Python publisher that consumes the same OS4CSAPI event stream and emits pygeoapi-shaped payloads directly. The seeder already implements ~80% of that translation."*
34+
35+
Phase 5 makes that "thin Python publisher" a first-class capability of *this* repo rather than a separate one-off.
36+
37+
---
38+
39+
## 3. Target Server Matrix
40+
41+
| # | Server | Base URL | Auth | Status |
42+
| - | ----------------------------- | --------------------------------------------------------- | ------ | ------------- |
43+
| 1 | OpenSensorHub (OSH) | `http://45.55.99.236:8080/sensorhub/api` | Basic | Production target — current default |
44+
| 2 | OS4CSAPI Go server | `https://129-80-248-53.sslip.io/csapi-go` | None | Production target — partially supported via in-line patches |
45+
| 3 | 52°North pygeoapi (live) | `https://129-80-248-53.sslip.io/csapi-pygeoapi` | None | New in Phase 9 — **not yet a publisher target** (blocker documented) |
46+
| 4 | csa.demo.52north.org (public) | `https://csa.demo.52north.org` | None | Read-only target — used for content-negotiation regression testing only |
47+
48+
Phase 5 success criteria are defined against servers 1, 2, and 3.
49+
50+
---
51+
52+
## 4. Problem Statement (per-server divergence the fleet currently can't handle)
53+
54+
Each row is a real, captured observation from the explorer's CRUD Smoke Test or from `seed_pygeoapi.py`:
55+
56+
| Concern | OSH | Go server | pygeoapi (Phase 9) |
57+
| ------------------------ | ----------------------------------------------------------- | ----------------------------------------------- | --------------------------------------------------------------------------------- |
58+
| `POST /systems` shape | CSAPI GeoJSON `Feature` (`type:"Feature"`, `properties{}`) | CSAPI GeoJSON `Feature` | **Stripped JSON; no `Feature` envelope**`Feature` triggers `AttrDict.get` crash |
59+
| `POST /procedures` shape | CSAPI GeoJSON `Feature` | CSAPI GeoJSON `Feature` | **SensorML JSON only**`{ type, id, definition, ... }` |
60+
| `POST /deployments` shape| CSAPI GeoJSON `Feature` | CSAPI GeoJSON `Feature` | **SensorML JSON only**, and `deployedSystems` field causes server `KeyError` |
61+
| `POST /samplingFeatures` | accepted | accepted | **405 Method Not Allowed** (read-only on this build) |
62+
| `controlstreams` path | **lowercase only** (`/controlstreams`) | camelCase (`/controlStreams`) | absent (not implemented) |
63+
| `commands` endpoint | only via `/controlstreams/{id}/commands` | top-level `/commands` | absent |
64+
| Pagination | `limit=1000` workaround currently masks `next`-link bugs | same | `next` link present, must be followed |
65+
| Auth | Basic auth required | none | none |
66+
| `Accept` header behavior | **ignored** — must use `?f=` query parameter | honored | honored, but `Accept: application/json` returns CSAPI envelope; smljson/geojson route through alternate stores |
67+
| Conformance advertised | 20+ CSAPI classes | partial CSAPI classes | only `ogcapi-common-1/1.0/conf/core` — no CSAPI classes despite working endpoints |
68+
| SensorML round-trip | accepts smljson on PUT but lossy on POST (issue #5 root) | unverified | required form for POSTs to systems/procedures/deployments |
69+
70+
The publisher fleet currently encodes a single, OSH-shaped path through this matrix. Every server quirk the team has hit so far has resulted in either an in-line `if`-branch in `bootstrap_helpers.py` or a workaround like `limit=1000`. This does not scale to a third target and is the structural reason Phase 9's publisher integration was abandoned.
71+
72+
---
73+
74+
## 5. Proposed Architecture
75+
76+
### 5.1 ServerProfile (new module)
77+
78+
A `ServerProfile` is a versioned, declarative description of *one* CSAPI server's quirks. The publisher fleet consumes a profile through the existing `Node` abstraction; there are no per-publisher conditionals.
79+
80+
**Proposed location:** `src/oshconnect/profiles/`.
81+
82+
**Proposed shape (sketch, not final API):**
83+
84+
```python
85+
@dataclass(frozen=True)
86+
class ServerProfile:
87+
name: str # "osh" | "csapi-go" | "pygeoapi-live" | ...
88+
base_url_pattern: str # for matching/auto-detection
89+
auth: AuthStrategy # BasicAuth | NoAuth | BearerToken | ApiKey
90+
endpoints: EndpointMap # canonical kind -> path (handles /controlstreams casing)
91+
content_negotiation: ContentNegotiation # Accept-honored vs ?f= override
92+
payload_shapes: PayloadShapeMap # per-resource POST/PUT body shape
93+
pagination: PaginationStrategy # next-link walker | limit=1000 | offset
94+
conformance_required: list[str] # used to fail-fast on unfit targets
95+
known_quirks: list[str] # human-readable; logged on connect
96+
```
97+
98+
Three profiles ship with Phase 5: `osh`, `csapi-go`, `pygeoapi-live`. They are loaded by name from a YAML registry (stretch goal: share the registry with `ogc-csapi-explorer/docs/governance/known-server-quirks.md` so quirks are single-sourced).
99+
100+
### 5.2 Node opens with a profile and probes /conformance
101+
102+
On `Node.connect()` the client:
103+
104+
1. GETs `/` and `/conformance`.
105+
2. Compares the advertised conformance classes to the profile's `conformance_required`.
106+
3. Logs a single "server profile loaded" line listing the active quirks.
107+
4. If a required class is absent, raises `ConformanceError` *before* the first publish — Phase 9's pygeoapi blocker would have surfaced here on connect rather than after four failed payload-rewrite attempts.
108+
109+
### 5.3 PayloadShape adapters
110+
111+
Three concrete adapters cover today's matrix:
112+
113+
- `CSAPIFeatureShape` — current OSH/Go default.
114+
- `SensorMLJSONShape` — pygeoapi's required form for `/systems`, `/procedures`, `/deployments`.
115+
- `StrippedJSONShape` — pygeoapi's `/systems` workaround for the `AttrDict` crash.
116+
117+
`bootstrap_helpers.py` calls `node.profile.shape_for("procedures").build(model)` instead of inlining the body. The existing SensorML round-trip work on the `fix/sml-content-type-and-shape` branch becomes the Pydantic v2 *source* model that the adapters serialize from — fixing #5 by construction rather than by patching the existing code path.
118+
119+
### 5.4 Pagination iterator
120+
121+
A single `paginate(node, url, params=None) -> Iterator[T]` replaces every `limit=1000` call site. It honors `links: rel=next` when present and falls back to offset-paging when absent. Closes #4 across `find_by_uid`, `find_datastream`, `_discover_system_ds`, and the `bootstrap_helpers` siblings.
122+
123+
### 5.5 HTTP resilience layer
124+
125+
`HTTPHelper` is wrapped (or replaced) with a layer that adds:
126+
127+
- per-request timeout (configurable, default 30s).
128+
- `tenacity`-style retry on 429/503/connection errors with exponential backoff.
129+
- bounded concurrency (Phase 4 replay engine already needs this; today it's ad-hoc).
130+
- a typed exception hierarchy: `CSAPIError → ServerProfileError | ConformanceError | PayloadShapeError | RateLimitError | TransportError`.
131+
132+
### 5.6 Full PUT and DELETE coverage
133+
134+
Phase 5 closes the CRUD matrix for the resource types the fleet actively uses. Per the OS4CSAPI library audit (`ogc-client-CSAPI_2/docs/research/requirements/csapi-oshconnect-python-analysis.md` §3.3) the original library implements only CREATE+READ for most resources. Reconciliation (delete-and-republish) is currently impossible without falling back to raw HTTP, and that's the actual workflow when a publisher's source data corrects itself.
135+
136+
### 5.7 Smoke-test parity with the explorer
137+
138+
Add `python -m oshconnect.smoke_test --profile <name>` that runs the same CRUD matrix the explorer's Smoke Test page runs. Reuses `ServerProfile` and `PayloadShape`, and emits a result table identical in structure to the explorer's, so a publisher engineer and a UI engineer are reading the same dashboard when they ask "is this server fit to publish to?"
139+
140+
---
141+
142+
## 6. Scope
143+
144+
### In scope
145+
146+
- New `oshconnect.profiles` module with `osh`, `csapi-go`, `pygeoapi-live` profiles.
147+
- Conformance probe on connect.
148+
- PayloadShape adapters covering the three target servers.
149+
- Pagination iterator (closes #4).
150+
- SensorML Pydantic v2 models + round-trip POST (closes #5, supersedes the in-flight `fix/sml-content-type-and-shape` branch).
151+
- HTTP resilience layer with typed exceptions.
152+
- PUT and DELETE for `systems`, `procedures`, `deployments`, `datastreams`, `controlstreams`, `samplingFeatures` — guarded by the active profile (not all servers support all paths).
153+
- `oshconnect.smoke_test` CLI parity with the explorer.
154+
- One existing publisher (`USGS_Water` is the smallest with full bootstrap coverage) ported end-to-end onto the profile abstraction as the reference port.
155+
156+
### Out of scope
157+
158+
- Reworking the Phase 4 NDJSON Replay Engine — it consumes the new resilience layer, but its architecture is unchanged.
159+
- Streaming (WebSocket / MQTT) auth strategies — Basic only is fine for Phase 5; bearer/OAuth is a Phase 6 concern.
160+
- Upstreaming any of this to `Botts-Innovative-Research/OSHConnect-Python`. The fork is a standalone project.
161+
- A pygeoapi-side fix for the `AttrDict` crash, the `samplingFeatures` 405, or the missing CSAPI conformance classes — those are upstream-server bugs, tracked in `52North/connected-systems-pygeoapi` and the Phase 9 deployment doc.
162+
- Sharing the YAML quirks registry with the explorer in this phase — listed as a stretch goal in §5.1.
163+
164+
---
165+
166+
## 7. Deliverables
167+
168+
1. `src/oshconnect/profiles/{__init__.py, base.py, osh.py, csapi_go.py, pygeoapi_live.py, registry.yaml}`.
169+
2. `src/oshconnect/payload_shapes/{__init__.py, csapi_feature.py, sensorml_json.py, stripped_json.py}`.
170+
3. `src/oshconnect/pagination.py` — single `paginate()` iterator.
171+
4. `src/oshconnect/http/{client.py, retry.py, exceptions.py}` — replaces / wraps `HTTPHelper`.
172+
5. `src/oshconnect/sensorml/` — Pydantic v2 models for SystemSML, ProcedureSML, DeploymentSML; lossless round-trip tests against the seed data captured in `ogc-client-CSAPI_2/docs/research/phase-9/captures/oracle-pygeoapi/`.
173+
6. `src/oshconnect/smoke_test.py` — CLI runner.
174+
7. `publishers/usgs_water/` — ported as the reference profile-driven publisher; existing OSH-targeted behavior preserved by selecting `--profile osh`.
175+
8. Tests:
176+
- Unit tests for each profile + payload shape (using `respx` against recorded responses).
177+
- Integration tests against OSH, Go, and pygeoapi-live (gated by `OSHCONNECT_LIVE=1`).
178+
- Round-trip SensorML fidelity tests fed by `Phase 9 captures/oracle-pygeoapi/`.
179+
9. `docs/research/Phase5_Results_Report.md` — published at phase close, mirroring the format of `Phase1_Bootstrap_Results.md` and `Phase4_Replay_Engine_Results.md`.
180+
181+
---
182+
183+
## 8. Verification Matrix (acceptance criteria)
184+
185+
| # | Criterion | How verified |
186+
| - | ---------------------------------------------------------------------------------------------------- | ------------ |
187+
| 1 | `Node.connect(profile="pygeoapi-live")` succeeds and logs the active quirks | Live integration test |
188+
| 2 | `usgs_water` publisher runs end-to-end against OSH, Go server, and pygeoapi-live without code change | Three live runs, captures committed |
189+
| 3 | `ensure_procedure` / `ensure_deployment` round-trip SensorML metadata losslessly (closes #5) | Round-trip test against seed captures |
190+
| 4 | All `find_by_uid`, `find_datastream`, `_discover_system_ds` call sites use `paginate()` (closes #4) | `grep` audit + unit test on multi-page server stub |
191+
| 5 | A publisher run that loses connectivity for 30s recovers without manual intervention | Toxiproxy / fault-injection test |
192+
| 6 | `python -m oshconnect.smoke_test --profile pygeoapi-live` produces a results table comparable to the explorer's CRUD Smoke Test | Side-by-side capture |
193+
| 7 | PUT and DELETE coverage exists for the six target resource types on at least one profile | Unit tests + one live profile per resource |
194+
| 8 | Removing the OSH `limit=1000` workarounds does not regress bootstrap idempotency | Existing bootstrap idempotency test passes |
195+
196+
---
197+
198+
## 9. Risks and Mitigations
199+
200+
| Risk | Mitigation |
201+
| --------------------------------------------------------------------------------- | ---------- |
202+
| pygeoapi-live's quirks shift between releases (e.g. `samplingFeatures` regains POST) | Profiles are versioned; `Node.connect` re-probes `/conformance` on every run and warns on drift |
203+
| Profile abstraction balloons into "yet another framework" | Hard cap: 3 profiles ship in Phase 5; no plugin loader, no DSL, no decorators. YAML registry only |
204+
| Resilience layer hides real server bugs | Retries are bounded and emit a structured warning per retry; integration tests explicitly assert `RateLimitError` / `TransportError` rather than treating all errors as transient |
205+
| SensorML model layer drifts from real server payloads | Round-trip tests are seeded *from* live captures, not from hand-written fixtures |
206+
| Reference port (`usgs_water`) succeeds but other publishers reveal hidden OSH-isms | After the reference port lands, run all 9 publishers in `--profile osh` mode against staging OSH; treat any difference from baseline as a bug |
207+
208+
---
209+
210+
## 10. Sequencing
211+
212+
A suggested sequence; adjust as discovered:
213+
214+
1. **5.1**`ServerProfile` skeleton, three profiles, `/conformance` probe, exception hierarchy. No payload changes yet.
215+
2. **5.2** — Pagination iterator + retire `limit=1000` workarounds (closes #4 mid-phase).
216+
3. **5.3** — PayloadShape adapters + SensorML Pydantic models + round-trip tests (closes #5 by construction; supersedes `fix/sml-content-type-and-shape` branch).
217+
4. **5.4** — HTTP resilience layer + typed exceptions.
218+
5. **5.5** — PUT/DELETE coverage.
219+
6. **5.6**`oshconnect.smoke_test` CLI.
220+
7. **5.7** — Reference port: `usgs_water` driven by profile; then full-fleet `--profile osh` regression run.
221+
8. **5.8** — First successful `usgs_water` run against `pygeoapi-live` (Phase 5 acceptance gate).
222+
223+
---
224+
225+
## 11. What Phase 5 explicitly does *not* attempt
226+
227+
- It does **not** introduce a streaming auth refactor (OAuth2 / API-Key).
228+
- It does **not** change the Phase 4 replay engine's architecture.
229+
- It does **not** try to patch the live pygeoapi server. Its 405/AttrDict/`deployedSystems`-KeyError quirks are accepted as facts of life and encoded in the `pygeoapi-live` profile.
230+
- It does **not** introduce a plugin system. Profiles are concrete classes plus a YAML file. Adding a fourth profile in a future phase is a code change.
231+
- It does **not** depend on the TypeScript explorer or the `ogc-client-CSAPI_2` library at runtime. Cross-references in §2 are *informational only*; this repo remains a standalone Python project.
232+
233+
---
234+
235+
## 12. Open Questions
236+
237+
1. Should the YAML profile registry be vendored into this repo, or pulled at install time from `ogc-csapi-explorer/docs/governance/known-server-quirks.md`? Phase 5 vendors. Phase 6 may reconsider.
238+
2. Do we want a single `Node` per profile, or should a `Node` accept profile *overrides* per call (e.g. forcing `Accept: application/sml+json` for one query)? Phase 5 ships per-Node only; per-call overrides are deferred.
239+
3. Where does the OSH-specific `?f=` query rewriter live — in the profile, or in `HTTPHelper`? Leaning toward profile, because it's a quirk fact, not a transport fact.
240+
4. The `fix/sml-content-type-and-shape` branch contains a partial fix to issue #5. Should it be merged before Phase 5 starts, or absorbed into 5.3? Recommend absorbing — the Pydantic model layer is a more durable fix than the patch series on that branch.

0 commit comments

Comments
 (0)