Commit cd70929
feat(waterdata): Add multi-value GET-parameter chunker for OGC API
For multi-value waterdata queries (e.g. monitoring_location_id with
~300+ sites), the GET URL produced by PR #233 blows past the server's
~8 KB nginx buffer and the API returns HTTP 414. This PR adds a
chunker that transparently splits long list params across sub-requests
so each URL fits the byte budget.
The chunker is a decorator applied to ``_fetch_once`` outside the
existing ``@filters.chunked`` (CQL chunker), so list-chunking is the
outer loop and filter-chunking is the inner loop:
@chunking.multi_value_chunked(build_request=_construct_api_requests)
@filters.chunked(build_request=_construct_api_requests)
def _fetch_once(args): ...
Key design points:
- ``_plan_chunks`` greedy-halves the largest chunk across all
dimensions until the worst-case sub-request fits ``url_limit``
(URL + body, via ``_request_bytes``, so POST routes are sized
correctly). Cartesian product of per-dim partitions becomes the
sub-request set; capped at ``max_chunks=1000``.
- ``_filter_aware_probe_args`` coordinates with ``filters.chunked``:
the planner probes URL length using a synthetic clause that matches
the inner filter chunker's bail-floor size (longest single clause,
scaled by worst-case URL encoding ratio). Without this coordination,
the outer planner would raise ``RequestTooLarge`` on combinations
the stacked chunkers can actually handle.
- ``QuotaExhausted`` mid-call guard reads ``x-ratelimit-remaining``
after each sub-request; if it drops below ``quota_safety_floor=50``,
the wrapper raises with the partial frame, completed-chunk offset,
and last observed remaining quota — letting callers salvage or
resume after the rate-limit window resets, rather than crash into a
silent mid-pagination 429.
- ``RequestTooLarge`` is raised when the smallest reducible plan
still exceeds ``url_limit`` (every multi-value param at a singleton
chunk and any chunkable filter at the inner chunker's bail floor)
or when the cartesian product exceeds ``max_chunks``.
- All defaults (``url_limit``, ``max_chunks``, ``quota_safety_floor``)
resolve at call time, so monkey-patching ``filters._WATERDATA_URL_
BYTE_LIMIT`` for tests / non-default quotas affects the decorator
uniformly.
Public additions:
- ``dataretrieval.waterdata.chunking.multi_value_chunked``
- ``dataretrieval.waterdata.chunking.RequestTooLarge``
- ``dataretrieval.waterdata.chunking.QuotaExhausted`` (carries
``partial_frame``, ``partial_response``, ``completed_chunks``,
``total_chunks``, ``remaining``)
Tests (30 new):
- ``_filter_aware_probe_args`` worst-case-clause modelling
- ``_plan_chunks`` greedy halving, RequestTooLarge floor, filter-
chunker coordination, ``max_chunks`` cap, lazy-default reads
- ``multi_value_chunked`` pass-through, cartesian-product shape,
end-to-end with stacked filter chunker
- ``QuotaExhausted`` header parsing, mid-call abort, last-chunk no-
abort, zero-floor disable
- ``RequestTooLarge`` message contents and triggering conditions
End-to-end correctness verified against the live API: identical
per-site cell-for-cell output between unchunked (single call) and
chunked (forced fan-out via patched limit) paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 4a65fb1 commit cd70929
6 files changed
Lines changed: 1146 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
1 | 3 | | |
2 | 4 | | |
3 | 5 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
233 | 248 | | |
234 | 249 | | |
235 | 250 | | |
| |||
0 commit comments