Skip to content

fix: disable HTTP response caching to prevent unbounded memory growth#952

Open
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
devin/1773351139-disable-http-cache
Open

fix: disable HTTP response caching to prevent unbounded memory growth#952
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
devin/1773351139-disable-http-cache

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 12, 2026

fix: disable HTTP response caching to prevent unbounded memory growth

Summary

Hardcodes _use_cache = False in HttpClient.__init__ to disable the requests_cache SQLite-backed HTTP response cache entirely. This prevents unbounded memory growth observed during long-running syncs where the SQLite cache accumulates responses in memory without bound.

Root cause investigation: During sandbox testing of source-twilio, container memory grew from ~300 MB to 8 GB over a few hours. After systematically ruling out Python heap leaks (memray showed constant ~300 MB), malloc fragmentation (jemalloc didn't help), and OS page cache (in-memory SQLite eliminated the RSS/working-set gap but RSS itself still grew), the cause was traced to requests_cache's SQLite backend accumulating cached HTTP responses. Even with TTL expiration and active purging, the cache grew unboundedly for connectors making thousands of unique paginated API calls. Disabling cache entirely confirmed stable memory at ~300 MB.

Scope: This is a global change — all connectors using HttpClient with use_cache=True will now have caching silently disabled. The use_cache constructor parameter is still accepted but ignored.

6 cache-related tests are skipped across 3 test files since they assert caching behavior that is now disabled.

Review & Testing Checklist for Human

  • Confirm global scope is acceptable: This disables caching for ALL connectors, not just Twilio. Connectors that rely on use_cache=True to deduplicate API calls (e.g., parent streams read by multiple child streams) will now make redundant HTTP requests. Evaluate whether any connectors depend on caching for correctness or have rate-limit-sensitive APIs where extra calls could cause failures.
  • Evaluate impact on sync performance: Connectors using caching to avoid re-fetching parent stream data will now re-fetch on every access. Check if this causes meaningful slowdowns or API quota issues for high-volume connectors.
  • Plan for permanent fix: The change is marked TEMPORARY. Decide on a proper fix — options include bounded LRU cache, stream-scoped cache with size limits, or removing the caching feature entirely and updating the use_cache API surface.
  • Verify skipped tests are tracked: 6 tests are skipped. Ensure there's a tracking issue to either re-enable them when caching is reintroduced or remove them if caching is permanently dropped.

Notes


Open with Devin

Hardcode _use_cache = False in HttpClient to prevent requests_cache
SQLite backend from accumulating cached HTTP responses in memory,
which causes container memory to grow unboundedly during long syncs.

Skip cache-related tests that expect caching to be active.

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1773351139-disable-http-cache#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1773351139-disable-http-cache

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

devin-ai-integration bot and others added 2 commits March 12, 2026 21:34
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@github-actions
Copy link

PyTest Results (Fast)

3 934 tests  ±0   3 915 ✅  - 7   7m 12s ⏱️ +3s
    1 suites ±0      19 💤 +7 
    1 files   ±0       0 ❌ ±0 

Results for commit fd49a15. ± Comparison against base commit 0e57414.

This pull request skips 7 tests.
unit_tests.sources.streams.http.test_http ‑ test_that_response_was_cached
unit_tests.sources.streams.http.test_http ‑ test_using_cache
unit_tests.sources.streams.http.test_http_client ‑ test_given_different_headers_then_response_is_not_cached
unit_tests.sources.streams.http.test_http_client ‑ test_request_session_returns_valid_session[False-LimiterSession]
unit_tests.sources.streams.http.test_http_client ‑ test_request_session_returns_valid_session[True-CachedLimiterSession]
unit_tests.sources.streams.http.test_http_client ‑ test_that_response_was_cached
unit_tests.sources.streams.test_call_rate.TestHttpStreamIntegration ‑ test_with_cache

@github-actions
Copy link

PyTest Results (Full)

3 937 tests  ±0   3 918 ✅  - 7   11m 18s ⏱️ +4s
    1 suites ±0      19 💤 +7 
    1 files   ±0       0 ❌ ±0 

Results for commit fd49a15. ± Comparison against base commit 0e57414.

This pull request skips 7 tests.
unit_tests.sources.streams.http.test_http ‑ test_that_response_was_cached
unit_tests.sources.streams.http.test_http ‑ test_using_cache
unit_tests.sources.streams.http.test_http_client ‑ test_given_different_headers_then_response_is_not_cached
unit_tests.sources.streams.http.test_http_client ‑ test_request_session_returns_valid_session[False-LimiterSession]
unit_tests.sources.streams.http.test_http_client ‑ test_request_session_returns_valid_session[True-CachedLimiterSession]
unit_tests.sources.streams.http.test_http_client ‑ test_that_response_was_cached
unit_tests.sources.streams.test_call_rate.TestHttpStreamIntegration ‑ test_with_cache

@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 12, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/23025648211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant