Skip to content

fix(retry): cap jittered delay to max_delay#2676

Open
Dhakshin2007 wants to merge 3 commits intoopenai:mainfrom
Dhakshin2007:main
Open

fix(retry): cap jittered delay to max_delay#2676
Dhakshin2007 wants to merge 3 commits intoopenai:mainfrom
Dhakshin2007:main

Conversation

@Dhakshin2007
Copy link

@Dhakshin2007 Dhakshin2007 commented Mar 15, 2026

Summary

This PR fixes a bug introduced in the feat: add opt-in model retry policies commit (3a52673) inside src/agents/run_internal/model_retry.py.

Per reviewer feedback from @seratch, the Retry-After OverflowError handling has been dropped as it is unnecessary in practice. This PR now focuses solely on the jitter cap fix.


Bug: Jitter can produce delay > max_delay (silent backoff violation)

Location: _default_retry_delay(), lines 287–290

Root cause:
The original code caps the exponential delay with min(..., max_delay) before applying jitter:

# BEFORE (buggy)
delay = min(initial_delay * (multiplier ** max(attempt - 1, 0)), max_delay)
if not use_jitter:
    return delay
return max(delay * (0.875 + random.random() * 0.25), 0.0)

Because jitter multiplies the already-capped value by up to 1.125, the final result can exceed max_delay by ~12.5%.

Fix:

# AFTER (fixed)
base = min(initial_delay * (multiplier ** max(attempt - 1, 0)), max_delay)
if not use_jitter:
    return base
return min(max(base * (0.875 + random.random() * 0.25), 0.0), max_delay)

Apply min(..., max_delay) after jitter so the final delay is always bounded by max_delay.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0161128362

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@Dhakshin2007
Copy link
Author

Thanks for the P2 feedback, @chatgpt-codex-connector!

You are correct — applying jitter to the uncapped exponential (delay * jitter_factor) and then clamping makes the jitter ineffective once the base grows well past max_delay (≥ attempt 5 with defaults), because even the minimum jitter factor (0.875) keeps the result above max_delay, so every call returns exactly max_delay.

The second commit addresses this by clamping the base first, then applying jitter to the clamped value, and finally capping the result:

# Clamp base first so jitter stays effective even at saturation
base = min(initial_delay * (multiplier ** max(attempt - 1, 0)), max_delay)
if not use_jitter:
    return base
return min(max(base * (0.875 + random.random() * 0.25), 0.0), max_delay)

With this approach at saturation (base = max_delay), jitter produces delays in [0.875 * max_delay, max_delay] — preserving the thundering-herd spread Codex correctly identified was lost.

@chatgpt-codex-connector
Copy link

Copy link
Member

@seratch seratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Retry-After overflow handling is not necessary in practice. That header value is supplied by the model provider, and an extremely large numeric value there is a very unlikely scenario. Also, the current overflow change does not materially improve the common path enough to justify carrying the extra logic.

The jitter cap fix, on the other hand, is useful on its own. It preserves randomness while ensuring the computed retry delay never exceeds max_delay, which is a concrete improvement to the backoff behavior.

Could you please update this PR to keep only the jitter improvement and drop the Retry-After overflow handling changes?

@seratch seratch marked this pull request as draft March 16, 2026 03:08
Refactor exception handling to onPer review by seratch: remove the Retry-After OverflowError handling as it is unnecessary in practice (the header value is supplied by the model provider and an extremely large numeric value is very unlikely). Keep only the jitter cap fix.ly catch ValueError.
@Dhakshin2007 Dhakshin2007 changed the title fix(retry): cap jittered delay to max_delay and catch OverflowError in Retry-After parsing fix(retry): cap jittered delay to max_delay Mar 16, 2026
@Dhakshin2007
Copy link
Author

@seratch Thank you for the review! I've addressed your feedback:

  • Dropped the Retry-After OverflowError handling as requested
  • The PR now contains only the jitter cap fix (capping the final jittered delay to max_delay)

Updated the PR title and description accordingly.

@Dhakshin2007 Dhakshin2007 marked this pull request as ready for review March 16, 2026 05:34
@Dhakshin2007 Dhakshin2007 requested a review from seratch March 16, 2026 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants