Problem
get_daily_extract_file in core.py:526-534 calls requests.get() without stream=True. The wrap_return decorator's FILE_RESPONSE path (common.py:64-67) then returns response.content, loading the entire file into memory before returning it.
Impact
Daily extract ZIPs can be 50+ MB. In memory-constrained environments (AWS Lambda at 128–512 MB), this wastes memory and risks OOM errors — especially when multiple extracts are processed concurrently.
POC Evidence
A streaming download of a 53.77 MB daily extract file was benchmarked against the current buffered approach:
| Approach |
Peak Memory |
Current (response.content) |
~108 MB |
Streaming (stream=True + iter_content) |
~55 KB |
Both requests (stream=True) and httpx streaming produced identical byte counts. The OFS endpoint includes a Content-Length header, so progress tracking is also possible.
Proposed Solution
Add a new method get_daily_extract_file_stream(date, filename, chunk_size=8192) that:
- Calls
requests.get(..., stream=True) (bypassing or extending the wrap_return decorator)
- Returns an iterator (or the response object as a context manager) yielding chunks via
iter_content(chunk_size)
- May require a new response type (e.g.,
STREAM_RESPONSE) in the wrap_return decorator, or can bypass it entirely for this method
Example usage (desired API)
with ofsc.get_daily_extract_file_stream(date="2024-01-15", filename="extract.zip") as stream:
with open("extract.zip", "wb") as f:
for chunk in stream.iter_content(chunk_size=8192):
f.write(chunk)
Bonus: additionalHeaders Bug
While reviewing the implementation, a bug was found in _base.py:220-223:
# line 221 merges headers correctly...
headers = {**self.headers, **additionalHeaders}
# ...but line 223 overwrites `headers` with self.headers, dropping the merge
headers = self.headers # BUG: discards additionalHeaders
This means additionalHeaders passed by callers is silently ignored. Worth fixing alongside the streaming work.
Problem
get_daily_extract_fileincore.py:526-534callsrequests.get()withoutstream=True. Thewrap_returndecorator'sFILE_RESPONSEpath (common.py:64-67) then returnsresponse.content, loading the entire file into memory before returning it.Impact
Daily extract ZIPs can be 50+ MB. In memory-constrained environments (AWS Lambda at 128–512 MB), this wastes memory and risks OOM errors — especially when multiple extracts are processed concurrently.
POC Evidence
A streaming download of a 53.77 MB daily extract file was benchmarked against the current buffered approach:
response.content)stream=True+iter_content)Both
requests(stream=True) andhttpxstreaming produced identical byte counts. The OFS endpoint includes aContent-Lengthheader, so progress tracking is also possible.Proposed Solution
Add a new method
get_daily_extract_file_stream(date, filename, chunk_size=8192)that:requests.get(..., stream=True)(bypassing or extending thewrap_returndecorator)iter_content(chunk_size)STREAM_RESPONSE) in thewrap_returndecorator, or can bypass it entirely for this methodExample usage (desired API)
Bonus:
additionalHeadersBugWhile reviewing the implementation, a bug was found in
_base.py:220-223:This means
additionalHeaderspassed by callers is silently ignored. Worth fixing alongside the streaming work.