You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Companion to #160 (sync streaming download), which documents the memory problem and POC evidence in detail.
Problem
The async client does not expose a get_daily_extract_file method at all (it is listed as a NotImplementedError stub per #151), and when it is eventually implemented it should not repeat the sync client's mistake of buffering the full response in memory.
httpx — the async HTTP library used by the async client — supports streaming via client.stream(...) as an async context manager. The async equivalent must use this pattern instead of await response.aread() or response.content.
Proposed Solution
Add get_daily_extract_file_stream(date, filename, chunk_size=8192) to the async client as an async method that:
Uses httpx.AsyncClient.stream("GET", url, ...) as an async context manager
Yields chunks via response.aiter_bytes(chunk_size) (async generator)
Allows the caller to control where bytes land (file, S3, GCS, etc.) without any intermediate buffering
Daily extract ZIPs can be 50+ MB. In Lambda (128–512 MB) or any async microservice, buffering the full file wastes memory and risks OOM. The sync POC showed peak memory dropped from ~108 MB → ~55 KB when streaming (see #160 for details). The async path should achieve the same.
Implementation Notes
httpx streaming requires the request to stay open for the duration of iteration; the method should be an async context manager (or return one) so connections are properly closed
Related
Companion to #160 (sync streaming download), which documents the memory problem and POC evidence in detail.
Problem
The async client does not expose a
get_daily_extract_filemethod at all (it is listed as aNotImplementedErrorstub per #151), and when it is eventually implemented it should not repeat the sync client's mistake of buffering the full response in memory.httpx— the async HTTP library used by the async client — supports streaming viaclient.stream(...)as an async context manager. The async equivalent must use this pattern instead ofawait response.aread()orresponse.content.Proposed Solution
Add
get_daily_extract_file_stream(date, filename, chunk_size=8192)to the async client as anasyncmethod that:httpx.AsyncClient.stream("GET", url, ...)as an async context managerresponse.aiter_bytes(chunk_size)(async generator)Example usage (desired API)
Or for direct streaming to S3 via a multipart upload:
Impact
Daily extract ZIPs can be 50+ MB. In Lambda (128–512 MB) or any async microservice, buffering the full file wastes memory and risks OOM. The sync POC showed peak memory dropped from ~108 MB → ~55 KB when streaming (see #160 for details). The async path should achieve the same.
Implementation Notes
httpxstreaming requires the request to stay open for the duration of iteration; the method should be an async context manager (or return one) so connections are properly closedwrap_returndecorator used by sync methods likely does not support async streaming — this method may need to bypass it similarly to the sync counterpart in feat: add streaming download support for daily extract files #160Content-Lengthis present in OFS responses, enabling progress tracking if desired