Skip to content

feat(core): implement robust CLI timeouts for long-running operations#23703

Open
kampitojha wants to merge 2 commits intogoogle-gemini:mainfrom
kampitojha:feat/fix-indefinite-hangs
Open

feat(core): implement robust CLI timeouts for long-running operations#23703
kampitojha wants to merge 2 commits intogoogle-gemini:mainfrom
kampitojha:feat/fix-indefinite-hangs

Conversation

@kampitojha
Copy link

Summary

This PR implements robust CLI timeouts and explicit abort logic to resolve indefinite hangs during long-running API operations (Issue #23688). It ensures that requests are always bounded and the CLI provides actionable feedback instead of silent unresponsiveness.

Details

1. Robust Retry Timing & Global Timeout

  • Enhanced the retryWithBackoff utility in @google/gemini-cli-core to support an overallTimeoutMs parameter.
  • Implemented a global timeout that spans the entire request lifecycle (including all retry attempts).
  • Introduced Promise.race inside the retry loop to forcefully interrupt hanging API calls that do not natively handle AbortSignal.

2. Configurable Durations

  • Added requestTimeoutMs to the global settings schema with a default of 5 minutes (300,000ms).
  • Exposed a new CLI flag --request-timeout <ms> for temporary overrides.
  • Updated GeminiChat to orchestrate proper cleanup of timeout timers using try/finally blocks and combined AbortSignal management.

3. Improved UI Transparency

  • Updated the useLoadingIndicator hook to append the actual error message (e.g., ETIMEDOUT, 503 Service Unavailable) to the retry hint.
  • The UI now explicitly shows the current attempt count (e.g., Attempt 2/10), giving users better visibility into connection status.

Related Issues

Fixes #23688

How to Validate

  1. Verify Default Timeout:
    • Run the CLI in an environment with high latency or simulated network hangs.
    • Confirm that the operation fails with a timeout message after exactly 5 minutes rather than hanging indefinitely.
  2. Test Custom Timeout Flag:
    • Run: gemini --request-timeout 10000 "How does a transistor work?" while throttling your network.
    • Observe that the CLI correctly aborts and returns an error within ~10 seconds.
  3. Verify Retry Visibility:
    • Trigger a retryable failure (e.g., by temporarily disconnecting internet).
    • Observe the loading indicator showing the specific error: Trying to reach gemini-1.5-pro (Attempt 2/10): fetch failed.
  4. Automated Tests:
    • Run: npm test -w @google/gemini-cli-core -- src/utils/retry.test.ts
    • Ensure the new test case should throw timeout error when overallTimeoutMs is exceeded passes.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@kampitojha kampitojha requested a review from a team as a code owner March 24, 2026 20:06
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses the issue of indefinite hangs during long-running API operations by introducing a comprehensive timeout mechanism. It ensures that all CLI requests are bounded by a configurable duration, providing users with clear feedback and preventing silent unresponsiveness. The changes integrate global timeouts, allow for user-defined timeout durations, and enhance the user interface to display detailed retry status and error messages.

Highlights

  • Robust Retry Timing & Global Timeout: The retryWithBackoff utility was enhanced to support an overallTimeoutMs parameter, implementing a global timeout across the entire request lifecycle, including retries. Promise.race is used to forcefully interrupt hanging API calls that do not natively handle AbortSignal.
  • Configurable Durations: A requestTimeoutMs setting was added to the global settings schema with a default of 5 minutes (300,000ms). A new CLI flag --request-timeout was exposed for temporary overrides.
  • Improved UI Transparency: The useLoadingIndicator hook was updated to append the actual error message (e.g., ETIMEDOUT, 503 Service Unavailable) to the retry hint, and the UI now explicitly shows the current attempt count (e.g., Attempt 2/10).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable requestTimeoutMs for API requests, allowing users to specify a maximum time via CLI arguments and general settings. The core GeminiChat and retryWithBackoff utilities are updated to implement this timeout using AbortController and setTimeout, and a new test case for timeout functionality is added. Review feedback highlights a critical issue with redundant timeout logic and incorrect signal propagation in geminiChat.ts, which could lead to resource leaks. Additionally, a high-severity bug was identified in retry.ts where the delay function was not using the correct overallSignal, potentially preventing proper timeout cancellation.

Note: Security Review did not run due to the size of the PR.

Comment on lines +524 to +542
const requestTimeoutMs =
this.context.config.getRequestTimeoutMs() ?? 300_000;

const timeoutController = new AbortController();
const timerId = setTimeout(
() => timeoutController.abort(),
requestTimeoutMs,
);

let combinedSignal = abortSignal;
if (typeof AbortSignal.any === 'function') {
combinedSignal = AbortSignal.any([abortSignal, timeoutController.signal]);
} else {
// Fallback for older Node.js
abortSignal?.addEventListener('abort', () => timeoutController.abort(), {
once: true,
});
combinedSignal = timeoutController.signal;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block introduces redundant timeout logic. The retryWithBackoff function, as modified in this PR, is designed to handle the overallTimeoutMs itself. This includes creating an AbortController, setting a setTimeout, and combining signals.

The current implementation creates two separate timers for the same timeout, which is inefficient and can lead to subtle race conditions or unexpected behavior.

Furthermore, there's a deeper issue with how signals are propagated. retryWithBackoff creates a new overallSignal but doesn't pass it to the fn it executes. This means that even if retryWithBackoff's Promise.race times out, the underlying network request within apiCall will not be aborted because it's not using the correct signal. This can lead to resource leaks (dangling network requests).

Recommendation:

  1. Remove this timeout creation logic (lines 524-542) and the corresponding try...finally block that clears the timer (lines 690 and 725-727).
  2. Pass the original abortSignal and requestTimeoutMs to retryWithBackoff.
  3. Refactor retryWithBackoff to pass its internally managed overallSignal to the fn function, so that the underlying operations can be properly cancelled.
References
  1. Asynchronous operations that can be cancelled by the user should accept and propagate an AbortSignal to ensure cancellability and prevent dangling processes or network requests.
  2. Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns.

if (onRetry) {
onRetry(attempt, new Error('Invalid content'), delayWithJitter);
}
await delay(delayWithJitter, signal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The delay function is called with the original signal here. It should be called with overallSignal to ensure that the overall timeout can interrupt the delay. All other delay calls in this function correctly use overallSignal.

Suggested change
await delay(delayWithJitter, signal);
await delay(delayWithJitter, overallSignal);
References
  1. Asynchronous operations that can be cancelled by the user should accept and propagate an AbortSignal to ensure cancellability and prevent dangling processes or network requests.
  2. Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns.

@gemini-cli gemini-cli bot added the area/security Issues related to security label Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/security Issues related to security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Login with google account fails with "Could not find operation..."

1 participant