Skip to content

feat: add async streaming download support for daily extract files #161

@be-ant

Description

@be-ant

Related

Companion to #160 (sync streaming download), which documents the memory problem and POC evidence in detail.

Problem

The async client does not expose a get_daily_extract_file method at all (it is listed as a NotImplementedError stub per #151), and when it is eventually implemented it should not repeat the sync client's mistake of buffering the full response in memory.

httpx — the async HTTP library used by the async client — supports streaming via client.stream(...) as an async context manager. The async equivalent must use this pattern instead of await response.aread() or response.content.

Proposed Solution

Add get_daily_extract_file_stream(date, filename, chunk_size=8192) to the async client as an async method that:

  1. Uses httpx.AsyncClient.stream("GET", url, ...) as an async context manager
  2. Yields chunks via response.aiter_bytes(chunk_size) (async generator)
  3. Allows the caller to control where bytes land (file, S3, GCS, etc.) without any intermediate buffering

Example usage (desired API)

async with ofsc.get_daily_extract_file_stream(date="2024-01-15", filename="extract.zip") as stream:
    async with aiofiles.open("extract.zip", "wb") as f:
        async for chunk in stream:
            await f.write(chunk)

Or for direct streaming to S3 via a multipart upload:

async with ofsc.get_daily_extract_file_stream(date="2024-01-15", filename="extract.zip") as stream:
    async for chunk in stream:
        await upload_part(chunk)

Impact

Daily extract ZIPs can be 50+ MB. In Lambda (128–512 MB) or any async microservice, buffering the full file wastes memory and risks OOM. The sync POC showed peak memory dropped from ~108 MB → ~55 KB when streaming (see #160 for details). The async path should achieve the same.

Implementation Notes

  • httpx streaming requires the request to stay open for the duration of iteration; the method should be an async context manager (or return one) so connections are properly closed
  • The wrap_return decorator used by sync methods likely does not support async streaming — this method may need to bypass it similarly to the sync counterpart in feat: add streaming download support for daily extract files #160
  • Content-Length is present in OFS responses, enabling progress tracking if desired

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions