feat: add configurable timeout for remote GETs to prevent build hangs#39
Draft
bjoern-weidlich-anchorage wants to merge 1 commit intoplatacard:mainfrom
Draft
Conversation
04fa1fe to
0ff83b5
Compare
Contributor
Author
|
@xakep666 what do you think? Without this I'm currently not able to use it. |
0ff83b5 to
1266033
Compare
Collaborator
|
Looks good. Maybe put timeouts should be added too. Timeout errors also should trigger circuit breaker. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a TCP connection goes half-dead during a remote GET (e.g. over cloud interconnects or unreliable network paths),
handleGetblocks indefinitely. The S3 response body is streamed directly into local disk storage viaio.Copy, which blocks forever waiting for data on a dead connection. This hangs the Go build process sincehandleGetis synchronous.The circuit breaker doesn't help because it only counts completed errors — a hanging request never returns an error.
Solution
Add
CACHEPROG_REMOTE_GET_TIMEOUT(and--remote-get-timeoutflag) that wraps the remote GET path inhandleGetwith a context timeout. When the timeout fires:Testing
CACHEPROG_REMOTE_GET_TIMEOUT=5s: timeout fires, miss reported, circuit breaker trips, build completes successfullyUsage
Setting to 0 (default) disables the timeout, preserving existing behavior.