feat(http): Migrate from requests to httpx#289
Conversation
thodson-usgs
left a comment
There was a problem hiding this comment.
Please address my comments.
| # Shared kwargs for every ``httpx`` call site. ``requests`` defaulted to | ||
| # following redirects on GET; ``httpx`` does not. ``timeout=None`` | ||
| # preserves the no-timeout behavior the chunker docs promise. |
There was a problem hiding this comment.
Omit this comment. Just determine the correct default behavior
| "Request URL too long. Modify your query to use fewer sites. " | ||
| + f"API response reason: {_reason}. Pseudo-code example of how to " | ||
| + f"split your query: \n {_example}" | ||
| f"API response reason: {response.reason_phrase}. Pseudo-code " | ||
| f"example of how to split your query: \n {_URL_TOO_LONG_EXAMPLE}" |
There was a problem hiding this comment.
Why are we duplicating this error message? form lines 316 to 319?
| Quota: each completed sub-request's ``x-ratelimit-remaining`` is | ||
| checked against the still-pending count, and | ||
| ``RequestExceedsQuota`` fires before the next sub-request if the | ||
| window can't cover the rest. Set ``API_USGS_LIMIT=0`` to skip this | ||
| pre-emptive check and attempt the full plan anyway. |
There was a problem hiding this comment.
I think we can remove this error. We don't really have a good idea of how many requests will be needed due to pagination.
|
|
||
| @contextmanager | ||
| def _publish_session(session: requests.Session) -> Iterator[None]: | ||
| def _publish_session(client: httpx.Client) -> Iterator[None]: |
There was a problem hiding this comment.
many private functions using the term "session". Should these be updated to "client" as per httpx terminology?
| reflects the full query. Set to the original-input URL when | ||
| chunking is needed; falls back to a worst-case sub-request | ||
| URL when even the original URL trips httpx's 64 KB cap. | ||
| ``None`` on the nothing-to-chunk passthrough path | ||
| (``fetch_once``'s response already carries the canonical URL) | ||
| and when no buildable URL exists. |
There was a problem hiding this comment.
This doc is too long. Can you reduce canonical_url doc to 3-4 lines.
| When the rate-limit window can't cover the still-pending | ||
| plan (checked after every non-final sub-request). |
There was a problem hiding this comment.
Omit this. We're not going to check. We'll just let it fail once the quota is reached.
6cad874 to
d9b1c6f
Compare
Swap the package's HTTP client: `requests.Session` → `httpx.Client`,
`requests.Response` → `httpx.Response`. Same call patterns, same
response shape; only the underlying transport changes. Headers,
timeout, and redirect defaults are centralized in
`dataretrieval.utils.HTTPX_DEFAULTS` (`httpx.Timeout(60.0,
connect=10.0)`, `follow_redirects=True`).
The chunker, paginated-loop helpers, and OGC waterdata fetchers
route through one `httpx.Client` that the chunker publishes on the
`_chunked_client` ContextVar so paginated sub-requests reuse the
connection pool across a single chunked call.
Three httpx behavior diffs handled defensively:
* `httpx.InvalidURL` is raised client-side when a URL component
exceeds httpx's 64 KB cap. Caught by `_safe_request_bytes` (treats
"too big to construct" as "doesn't fit", so the planner's halving
loop keeps shrinking) AND by `_issue` / `_classify_chunk_error`
(treats a runtime InvalidURL as `ServiceInterrupted` so partial
state remains recoverable via `.call.resume()`). Note that
`httpx.InvalidURL` does NOT inherit from `httpx.HTTPError` — it
needs an explicit catch.
* `httpx.Response.elapsed` is only populated once the response is
closed; `pytest-httpx` mock responses don't populate it. The new
`_safe_elapsed` helper falls back to `timedelta(0)`.
* `httpx.Response.url` is a read-only property. The new
`_set_response_url` helper rewrites it by reseating the bound
request, with a fallback path for `Mock`-shaped test responses.
Tests migrate from `requests_mock` to native `pytest-httpx`. The
new `tests/conftest.py` is ~30 lines configuring pytest-httpx
strict-mode relaxations.
Backwards-compat:
* `BaseMetadata.header` is now `httpx.Headers` instead of
`requests.structures.CaseInsensitiveDict`. Case-insensitive
`.get(...)` still works; literal dict equality
(`md.header == {"k": "v"}`) no longer holds because
`httpx.Headers` carries auto-added entries.
* `BaseMetadata.url` is coerced to `str`.
* `RequestExceedsQuota` and `API_USGS_LIMIT` are removed — the
chunker no longer pre-empts on `x-ratelimit-remaining`. A natural
429 still surfaces as `QuotaExhausted` via `_classify_chunk_error`,
carrying partial state for `.call.resume()`.
* The CI flaky-rerun regex now matches `httpx.ConnectError` as well
as the legacy `ConnectionError` string.
Test count: 404 mocked tests passing, 2 skipped, ruff clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
d9b1c6f to
74ed873
Compare
Summary
requests.Session→httpx.Client,requests.Response→httpx.Response. Same call shape; same response shape; only the underlying transport changes.dataretrieval.utils.HTTPX_DEFAULTS(httpx.Timeout(60.0, connect=10.0),follow_redirects=True).httpx.Clientthat the chunker publishes on the_chunked_clientContextVar so paginated sub-requests reuse the connection pool across a single chunked call.requests_mockto nativepytest-httpx; the old_RequestsMockShimis gone.Why httpx
httpxships sync and async clients on a unified API, so the same request shape can power both the existing synchronous getters and a follow-up async parallel chunker (see #285, stacked on this PR).requestsis still actively maintained but has no first-class async story — adding one for #285 underrequestswould have been aThreadPoolExecutorbolt-on, one-off, and not reusable.Other minor wins: end-to-end type hints, HTTP/2 support if we ever want it, smaller wheel footprint vs
requests + chardet.Backwards-compat surface
BaseMetadata.headeris nowhttpx.Headersinstead ofrequests.structures.CaseInsensitiveDict. Case-insensitive.get(...)still works; literal dict equality (md.header == {"k": "v"}) no longer holds becausehttpx.Headerscarries auto-added entries.BaseMetadata.urlis coerced tostr(was alreadystr-ish; now explicitstr(httpx.Response.url)).RequestExceedsQuotaandAPI_USGS_LIMITare removed (per review). The chunker no longer pre-empts onx-ratelimit-remaining; a natural 429 still surfaces asQuotaExhaustedcarrying.callfor.call.resume()recovery.Three httpx behavior diffs handled defensively
httpx.InvalidURLis raised client-side when a URL component > 64 KB. Caught by_safe_request_bytes(treats "too big to construct" as "doesn't fit", so the planner's halving loop keeps shrinking) AND by_issue/_classify_chunk_error(treats a runtime InvalidURL asServiceInterruptedso partial state remains recoverable via.call.resume()). Note:httpx.InvalidURLdoes NOT inherit fromhttpx.HTTPError— needs an explicit catch.httpx.Response.elapsedis only populated when the response is closed, andpytest-httpxmock responses don't populate it. The new_safe_elapsedhelper falls back totimedelta(0).httpx.Response.urlis a read-only property. The new_set_response_urlhelper rewrites it by reseating the bound request, with a fallback path forMock-shaped test responses.Test plan
pytest-httpx. The newtests/conftest.pyis ~30 lines configuring pytest-httpx strict-mode relaxations.test_connection_error_wrapped_as_service_interrupted—httpx.ConnectErrormid-fetch surfaces as resumableServiceInterruptedtest_invalid_url_wrapped_as_service_interrupted—httpx.InvalidURLmid-fetch (e.g., oversize cursor URL) surfaces as resumableServiceInterruptedtest_query_nldi_non_200_surfaces_reason_phrase— locks in the.reason→.reason_phrasemigration missruff checkandruff format --checkpass.Stacked PR
The async parallel chunker (#285) sits on top of this branch and uses the same
httpx.AsyncClient. After this merges, #285 rebases ontomainfor a clean diff.🤖 Generated with Claude Code