fix: disable HTTP response caching to prevent unbounded memory growth#952
fix: disable HTTP response caching to prevent unbounded memory growth#952devin-ai-integration[bot] wants to merge 3 commits intomainfrom
Conversation
Hardcode _use_cache = False in HttpClient to prevent requests_cache SQLite backend from accumulating cached HTTP responses in memory, which causes container memory to grow unboundedly during long syncs. Skip cache-related tests that expect caching to be active. Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1773351139-disable-http-cache#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1773351139-disable-http-cachePR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
PyTest Results (Fast)3 934 tests ±0 3 915 ✅ - 7 7m 12s ⏱️ +3s Results for commit fd49a15. ± Comparison against base commit 0e57414. This pull request skips 7 tests. |
PyTest Results (Full)3 937 tests ±0 3 918 ✅ - 7 11m 18s ⏱️ +4s Results for commit fd49a15. ± Comparison against base commit 0e57414. This pull request skips 7 tests. |
|
/prerelease
|
fix: disable HTTP response caching to prevent unbounded memory growth
Summary
Hardcodes
_use_cache = FalseinHttpClient.__init__to disable therequests_cacheSQLite-backed HTTP response cache entirely. This prevents unbounded memory growth observed during long-running syncs where the SQLite cache accumulates responses in memory without bound.Root cause investigation: During sandbox testing of
source-twilio, container memory grew from ~300 MB to 8 GB over a few hours. After systematically ruling out Python heap leaks (memray showed constant ~300 MB), malloc fragmentation (jemalloc didn't help), and OS page cache (in-memory SQLite eliminated the RSS/working-set gap but RSS itself still grew), the cause was traced torequests_cache's SQLite backend accumulating cached HTTP responses. Even with TTL expiration and active purging, the cache grew unboundedly for connectors making thousands of unique paginated API calls. Disabling cache entirely confirmed stable memory at ~300 MB.Scope: This is a global change — all connectors using
HttpClientwithuse_cache=Truewill now have caching silently disabled. Theuse_cacheconstructor parameter is still accepted but ignored.6 cache-related tests are skipped across 3 test files since they assert caching behavior that is now disabled.
Review & Testing Checklist for Human
use_cache=Trueto deduplicate API calls (e.g., parent streams read by multiple child streams) will now make redundant HTTP requests. Evaluate whether any connectors depend on caching for correctness or have rate-limit-sensitive APIs where extra calls could cause failures.TEMPORARY. Decide on a proper fix — options include bounded LRU cache, stream-scoped cache with size limits, or removing the caching feature entirely and updating theuse_cacheAPI surface.Notes
use_cacheparameter onHttpClient.__init__is now dead code — it's accepted but always overridden toFalse. Consider whether to keep the parameter (for future re-enablement) or remove it to avoid confusion.