Skip to content

feat(http): Migrate from requests to httpx#289

Open
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:httpx-upgrade
Open

feat(http): Migrate from requests to httpx#289
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:httpx-upgrade

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 25, 2026

Summary

  • Swaps the HTTP client across the package: requests.Sessionhttpx.Client, requests.Responsehttpx.Response. Same call shape; same response shape; only the underlying transport changes.
  • Centralizes header / timeout / redirect defaults in dataretrieval.utils.HTTPX_DEFAULTS (httpx.Timeout(60.0, connect=10.0), follow_redirects=True).
  • Routes the chunker, paginated-loop helpers, and OGC waterdata fetchers through one httpx.Client that the chunker publishes on the _chunked_client ContextVar so paginated sub-requests reuse the connection pool across a single chunked call.
  • Migrates the test suite from requests_mock to native pytest-httpx; the old _RequestsMockShim is gone.

Why httpx

httpx ships sync and async clients on a unified API, so the same request shape can power both the existing synchronous getters and a follow-up async parallel chunker (see #285, stacked on this PR). requests is still actively maintained but has no first-class async story — adding one for #285 under requests would have been a ThreadPoolExecutor bolt-on, one-off, and not reusable.

Other minor wins: end-to-end type hints, HTTP/2 support if we ever want it, smaller wheel footprint vs requests + chardet.

Backwards-compat surface

  • BaseMetadata.header is now httpx.Headers instead of requests.structures.CaseInsensitiveDict. Case-insensitive .get(...) still works; literal dict equality (md.header == {"k": "v"}) no longer holds because httpx.Headers carries auto-added entries.
  • BaseMetadata.url is coerced to str (was already str-ish; now explicit str(httpx.Response.url)).
  • RequestExceedsQuota and API_USGS_LIMIT are removed (per review). The chunker no longer pre-empts on x-ratelimit-remaining; a natural 429 still surfaces as QuotaExhausted carrying .call for .call.resume() recovery.

Three httpx behavior diffs handled defensively

  • httpx.InvalidURL is raised client-side when a URL component > 64 KB. Caught by _safe_request_bytes (treats "too big to construct" as "doesn't fit", so the planner's halving loop keeps shrinking) AND by _issue / _classify_chunk_error (treats a runtime InvalidURL as ServiceInterrupted so partial state remains recoverable via .call.resume()). Note: httpx.InvalidURL does NOT inherit from httpx.HTTPError — needs an explicit catch.
  • httpx.Response.elapsed is only populated when the response is closed, and pytest-httpx mock responses don't populate it. The new _safe_elapsed helper falls back to timedelta(0).
  • httpx.Response.url is a read-only property. The new _set_response_url helper rewrites it by reseating the bound request, with a fallback path for Mock-shaped test responses.

Test plan

  • 404 mocked tests pass after migrating to pytest-httpx. The new tests/conftest.py is ~30 lines configuring pytest-httpx strict-mode relaxations.
  • New regression tests:
    • test_connection_error_wrapped_as_service_interruptedhttpx.ConnectError mid-fetch surfaces as resumable ServiceInterrupted
    • test_invalid_url_wrapped_as_service_interruptedhttpx.InvalidURL mid-fetch (e.g., oversize cursor URL) surfaces as resumable ServiceInterrupted
    • test_query_nldi_non_200_surfaces_reason_phrase — locks in the .reason.reason_phrase migration miss
  • ruff check and ruff format --check pass.
  • CI green (lint, docs, ubuntu/windows × 3.9/3.13/3.14).

Stacked PR

The async parallel chunker (#285) sits on top of this branch and uses the same httpx.AsyncClient. After this merges, #285 rebases onto main for a clean diff.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

@thodson-usgs thodson-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address my comments.

Comment thread dataretrieval/utils.py Outdated
Comment on lines +14 to +16
# Shared kwargs for every ``httpx`` call site. ``requests`` defaulted to
# following redirects on GET; ``httpx`` does not. ``timeout=None``
# preserves the no-timeout behavior the chunker docs promise.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit this comment. Just determine the correct default behavior

Comment thread dataretrieval/utils.py Outdated
Comment on lines +332 to +334
"Request URL too long. Modify your query to use fewer sites. "
+ f"API response reason: {_reason}. Pseudo-code example of how to "
+ f"split your query: \n {_example}"
f"API response reason: {response.reason_phrase}. Pseudo-code "
f"example of how to split your query: \n {_URL_TOO_LONG_EXAMPLE}"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we duplicating this error message? form lines 316 to 319?

Comment thread dataretrieval/waterdata/chunking.py Outdated
Comment on lines +12 to +16
Quota: each completed sub-request's ``x-ratelimit-remaining`` is
checked against the still-pending count, and
``RequestExceedsQuota`` fires before the next sub-request if the
window can't cover the rest. Set ``API_USGS_LIMIT=0`` to skip this
pre-emptive check and attempt the full plan anyway.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this error. We don't really have a good idea of how many requests will be needed due to pagination.

Comment thread dataretrieval/waterdata/chunking.py Outdated

@contextmanager
def _publish_session(session: requests.Session) -> Iterator[None]:
def _publish_session(client: httpx.Client) -> Iterator[None]:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many private functions using the term "session". Should these be updated to "client" as per httpx terminology?

Comment thread dataretrieval/waterdata/chunking.py Outdated
Comment on lines +680 to +685
reflects the full query. Set to the original-input URL when
chunking is needed; falls back to a worst-case sub-request
URL when even the original URL trips httpx's 64 KB cap.
``None`` on the nothing-to-chunk passthrough path
(``fetch_once``'s response already carries the canonical URL)
and when no buildable URL exists.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is too long. Can you reduce canonical_url doc to 3-4 lines.

Comment thread dataretrieval/waterdata/chunking.py Outdated
Comment on lines +1251 to +1252
When the rate-limit window can't cover the still-pending
plan (checked after every non-final sub-request).
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit this. We're not going to check. We'll just let it fail once the quota is reached.

@thodson-usgs thodson-usgs force-pushed the httpx-upgrade branch 2 times, most recently from 6cad874 to d9b1c6f Compare May 25, 2026 21:38
Swap the package's HTTP client: `requests.Session` → `httpx.Client`,
`requests.Response` → `httpx.Response`. Same call patterns, same
response shape; only the underlying transport changes. Headers,
timeout, and redirect defaults are centralized in
`dataretrieval.utils.HTTPX_DEFAULTS` (`httpx.Timeout(60.0,
connect=10.0)`, `follow_redirects=True`).

The chunker, paginated-loop helpers, and OGC waterdata fetchers
route through one `httpx.Client` that the chunker publishes on the
`_chunked_client` ContextVar so paginated sub-requests reuse the
connection pool across a single chunked call.

Three httpx behavior diffs handled defensively:

* `httpx.InvalidURL` is raised client-side when a URL component
  exceeds httpx's 64 KB cap. Caught by `_safe_request_bytes` (treats
  "too big to construct" as "doesn't fit", so the planner's halving
  loop keeps shrinking) AND by `_issue` / `_classify_chunk_error`
  (treats a runtime InvalidURL as `ServiceInterrupted` so partial
  state remains recoverable via `.call.resume()`). Note that
  `httpx.InvalidURL` does NOT inherit from `httpx.HTTPError` — it
  needs an explicit catch.
* `httpx.Response.elapsed` is only populated once the response is
  closed; `pytest-httpx` mock responses don't populate it. The new
  `_safe_elapsed` helper falls back to `timedelta(0)`.
* `httpx.Response.url` is a read-only property. The new
  `_set_response_url` helper rewrites it by reseating the bound
  request, with a fallback path for `Mock`-shaped test responses.

Tests migrate from `requests_mock` to native `pytest-httpx`. The
new `tests/conftest.py` is ~30 lines configuring pytest-httpx
strict-mode relaxations.

Backwards-compat:

* `BaseMetadata.header` is now `httpx.Headers` instead of
  `requests.structures.CaseInsensitiveDict`. Case-insensitive
  `.get(...)` still works; literal dict equality
  (`md.header == {"k": "v"}`) no longer holds because
  `httpx.Headers` carries auto-added entries.
* `BaseMetadata.url` is coerced to `str`.
* `RequestExceedsQuota` and `API_USGS_LIMIT` are removed — the
  chunker no longer pre-empts on `x-ratelimit-remaining`. A natural
  429 still surfaces as `QuotaExhausted` via `_classify_chunk_error`,
  carrying partial state for `.call.resume()`.
* The CI flaky-rerun regex now matches `httpx.ConnectError` as well
  as the legacy `ConnectionError` string.

Test count: 404 mocked tests passing, 2 skipped, ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant