Skip to content

feat: add configurable timeout for remote GETs to prevent build hangs#39

Draft
bjoern-weidlich-anchorage wants to merge 1 commit intoplatacard:mainfrom
bjoern-weidlich-anchorage:bjoern/s3-read-timeout
Draft

feat: add configurable timeout for remote GETs to prevent build hangs#39
bjoern-weidlich-anchorage wants to merge 1 commit intoplatacard:mainfrom
bjoern-weidlich-anchorage:bjoern/s3-read-timeout

Conversation

@bjoern-weidlich-anchorage
Copy link
Copy Markdown
Contributor

Problem

When a TCP connection goes half-dead during a remote GET (e.g. over cloud interconnects or unreliable network paths), handleGet blocks indefinitely. The S3 response body is streamed directly into local disk storage via io.Copy, which blocks forever waiting for data on a dead connection. This hangs the Go build process since handleGet is synchronous.

The circuit breaker doesn't help because it only counts completed errors — a hanging request never returns an error.

Solution

Add CACHEPROG_REMOTE_GET_TIMEOUT (and --remote-get-timeout flag) that wraps the remote GET path in handleGet with a context timeout. When the timeout fires:

  • The context is cancelled, aborting the S3 read
  • The response is reported as a cache miss (not an error), so Go compiles from source instead of failing the build
  • This applies to both the S3 fetch and the subsequent local write (which streams from the S3 body)

Testing

  • Added unit test verifying timeout returns a miss within the expected time window
  • Manual testing with a black hole TCP server (accepts connections, never responds):
    • Without timeout: build hangs indefinitely
    • With CACHEPROG_REMOTE_GET_TIMEOUT=5s: timeout fires, miss reported, circuit breaker trips, build completes successfully

Usage

CACHEPROG_REMOTE_GET_TIMEOUT=2m

Setting to 0 (default) disables the timeout, preserving existing behavior.

@bjoern-weidlich-anchorage
Copy link
Copy Markdown
Contributor Author

@xakep666 what do you think? Without this I'm currently not able to use it.

@xakep666
Copy link
Copy Markdown
Collaborator

xakep666 commented Apr 6, 2026

Looks good. Maybe put timeouts should be added too. Timeout errors also should trigger circuit breaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants