feat(core): implement robust CLI timeouts for long-running operations#23703
feat(core): implement robust CLI timeouts for long-running operations#23703kampitojha wants to merge 2 commits intogoogle-gemini:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses the issue of indefinite hangs during long-running API operations by introducing a comprehensive timeout mechanism. It ensures that all CLI requests are bounded by a configurable duration, providing users with clear feedback and preventing silent unresponsiveness. The changes integrate global timeouts, allow for user-defined timeout durations, and enhance the user interface to display detailed retry status and error messages. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable requestTimeoutMs for API requests, allowing users to specify a maximum time via CLI arguments and general settings. The core GeminiChat and retryWithBackoff utilities are updated to implement this timeout using AbortController and setTimeout, and a new test case for timeout functionality is added. Review feedback highlights a critical issue with redundant timeout logic and incorrect signal propagation in geminiChat.ts, which could lead to resource leaks. Additionally, a high-severity bug was identified in retry.ts where the delay function was not using the correct overallSignal, potentially preventing proper timeout cancellation.
Note: Security Review did not run due to the size of the PR.
packages/core/src/core/geminiChat.ts
Outdated
| const requestTimeoutMs = | ||
| this.context.config.getRequestTimeoutMs() ?? 300_000; | ||
|
|
||
| const timeoutController = new AbortController(); | ||
| const timerId = setTimeout( | ||
| () => timeoutController.abort(), | ||
| requestTimeoutMs, | ||
| ); | ||
|
|
||
| let combinedSignal = abortSignal; | ||
| if (typeof AbortSignal.any === 'function') { | ||
| combinedSignal = AbortSignal.any([abortSignal, timeoutController.signal]); | ||
| } else { | ||
| // Fallback for older Node.js | ||
| abortSignal?.addEventListener('abort', () => timeoutController.abort(), { | ||
| once: true, | ||
| }); | ||
| combinedSignal = timeoutController.signal; | ||
| } |
There was a problem hiding this comment.
This block introduces redundant timeout logic. The retryWithBackoff function, as modified in this PR, is designed to handle the overallTimeoutMs itself. This includes creating an AbortController, setting a setTimeout, and combining signals.
The current implementation creates two separate timers for the same timeout, which is inefficient and can lead to subtle race conditions or unexpected behavior.
Furthermore, there's a deeper issue with how signals are propagated. retryWithBackoff creates a new overallSignal but doesn't pass it to the fn it executes. This means that even if retryWithBackoff's Promise.race times out, the underlying network request within apiCall will not be aborted because it's not using the correct signal. This can lead to resource leaks (dangling network requests).
Recommendation:
- Remove this timeout creation logic (lines 524-542) and the corresponding
try...finallyblock that clears the timer (lines 690 and 725-727). - Pass the original
abortSignalandrequestTimeoutMstoretryWithBackoff. - Refactor
retryWithBackoffto pass its internally managedoverallSignalto thefnfunction, so that the underlying operations can be properly cancelled.
References
- Asynchronous operations that can be cancelled by the user should accept and propagate an
AbortSignalto ensure cancellability and prevent dangling processes or network requests. - Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns.
packages/core/src/utils/retry.ts
Outdated
| if (onRetry) { | ||
| onRetry(attempt, new Error('Invalid content'), delayWithJitter); | ||
| } | ||
| await delay(delayWithJitter, signal); |
There was a problem hiding this comment.
The delay function is called with the original signal here. It should be called with overallSignal to ensure that the overall timeout can interrupt the delay. All other delay calls in this function correctly use overallSignal.
| await delay(delayWithJitter, signal); | |
| await delay(delayWithJitter, overallSignal); |
References
- Asynchronous operations that can be cancelled by the user should accept and propagate an
AbortSignalto ensure cancellability and prevent dangling processes or network requests. - Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns.
… signals to api calls
Summary
This PR implements robust CLI timeouts and explicit abort logic to resolve indefinite hangs during long-running API operations (Issue #23688). It ensures that requests are always bounded and the CLI provides actionable feedback instead of silent unresponsiveness.
Details
1. Robust Retry Timing & Global Timeout
@google/gemini-cli-coreto support anoverallTimeoutMsparameter.Promise.raceinside the retry loop to forcefully interrupt hanging API calls that do not natively handleAbortSignal.2. Configurable Durations
requestTimeoutMsto the global settings schema with a default of 5 minutes (300,000ms).--request-timeout <ms>for temporary overrides.try/finallyblocks and combinedAbortSignalmanagement.3. Improved UI Transparency
ETIMEDOUT,503 Service Unavailable) to the retry hint.Attempt 2/10), giving users better visibility into connection status.Related Issues
Fixes #23688
How to Validate
gemini --request-timeout 10000 "How does a transistor work?"while throttling your network.Trying to reach gemini-1.5-pro (Attempt 2/10): fetch failed.npm test -w @google/gemini-cli-core -- src/utils/retry.test.tsshould throw timeout error when overallTimeoutMs is exceededpasses.Pre-Merge Checklist