A small, generic Go library for retrying fallible operations with exponential backoff and pluggable jitter strategies.
- Generic — works with any return type via
Do[T] - Exponential backoff with pluggable jitter — Full Jitter (default) or Equal Jitter
Permanenterrors — stop retrying immediately for non-recoverable failuresIsPermanent(err)— inspect whether an error originated from a permanent failure- Per-error budgets —
WithAttemptsForError(n, err)caps retries for a specific error independently of the global limit - Per-attempt timeout —
WithTimeout(d)cancels a single slow attempt without affecting the overall retry budget - Error aggregation —
WithAllErrors()collects every attempt error; inspect the full history viaerrors.Is/errors.As - Custom delay function —
WithDelayFuncreplaces the built-in backoff with any schedule: fixed, linear, or error-dependent - Jitter window cap —
WithMaxJitter(d)constrains spread independently of the backoff cap RetryAftererinterface — errors can specify their own wait duration (e.g. HTTP 429)- Custom predicates — decide per-error whether to retry
- Testable — injectable
Clockinterface for time-travel in unit tests - Infinite retry —
WithInfiniteRetry()retries until success or context cancellation - Context-aware — honours cancellation and deadline at every wait point
go get github.com/nodivbyzero/tryval, err := try.Do(ctx, func(ctx context.Context) (string, error) {
return callExternalAPI(ctx)
})Do retries up to 5 times by default, with exponential backoff capped at 30 seconds.
Default retry behaviour:
Doretries on every error exceptcontext.Canceled,context.DeadlineExceeded, and errors wrapped withPermanent. This means validation errors, auth failures, and malformed-payload errors will be retried unless you opt out. For production use, always supply aWithRetryIfpredicate to avoid wasting attempts on non-transient failures.
Use WithInfiniteRetry to retry until the function succeeds, a Permanent error
is returned, or the context is cancelled. Always pair it with a context deadline:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
val, err := try.Do(ctx, fetchUser,
try.WithInfiniteRetry(),
try.WithOnRetry(func(info try.RetryInfo) {
slog.Warn("retrying", "attempt", info.Attempt, "error", info.Err)
}),
)
// err will be context.DeadlineExceeded (wrapping the last op error) if
// the function never succeeds within the timeout.Each option is a functional setter for a field on Config:
| Option | Config Field |
Default | Description |
|---|---|---|---|
WithAttempts(n int) |
MaxAttempts |
5 |
Total attempts including the first call |
WithInfiniteRetry() |
MaxAttempts |
— | Retry until success, Permanent error, or context cancellation |
WithInitialDelay(d time.Duration) |
InitialDelay |
200ms |
Starting backoff; doubles each attempt up to MaxDelay |
WithMaxDelay(d time.Duration) |
MaxDelay |
30s |
Upper bound on any single wait regardless of backoff growth |
WithJitter(s JitterStrategy) |
Jitter |
FullJitter |
Jitter strategy: FullJitter or EqualJitter |
WithDelayFunc(fn func(int, error) time.Duration) |
DelayFunc |
nil | Replace built-in backoff entirely; RetryAfterer still takes precedence |
WithMaxJitter(d time.Duration) |
MaxJitter |
disabled | Cap jitter window independently of backoff cap |
WithRetryIf(fn func(error) bool) |
Predicate |
retry all | Return false to stop retrying for a given error |
WithAttemptsForError(n int, err error) |
ErrorBudgets |
— | Cap retries for a specific error; multiple calls accumulate |
WithTimeout(d time.Duration) |
AttemptTimeout |
disabled | Per-attempt deadline; cancelled attempts are retried |
WithAllErrors() |
AllErrors |
false | Aggregate all attempt errors into *AttemptErrors |
WithOnRetry(fn func(RetryInfo)) |
OnRetry |
nil | Callback fired before each wait — use for logging or metrics |
WithClock(clk Clock) |
Clock |
time.After |
Injectable clock for time-travel in tests |
val, err := try.Do(ctx, fetchUser,
try.WithAttempts(10), // Config.MaxAttempts = 10
try.WithInitialDelay(500*time.Millisecond), // Config.InitialDelay = 500ms
try.WithMaxDelay(2*time.Minute), // Config.MaxDelay = 2m
try.WithJitter(try.EqualJitter), // Config.Jitter = EqualJitter
try.WithRetryIf(func(err error) bool {
return isTransient(err) // Config.Predicate
}),
try.WithOnRetry(func(info try.RetryInfo) {
slog.Warn("retrying",
"attempt", info.Attempt,
"delay", info.Delay,
"error", info.Err,
)
}),
)Wrap an error with try.Permanent to stop the retry loop without waiting for remaining attempts:
val, err := try.Do(ctx, func(ctx context.Context) (*User, error) {
u, err := db.Find(ctx, id)
if errors.Is(err, sql.ErrNoRows) {
return nil, try.Permanent(err) // no point retrying
}
return u, err
})The underlying error is unwrapped, so errors.Is / errors.As work normally on the returned error.
Use IsPermanent to check whether an error came from a permanent failure at any call site — without unwrapping manually:
val, err := try.Do(ctx, fn)
if try.IsPermanent(err) {
// non-recoverable — do not retry at a higher level
return err
}IsPermanent works through additional wrapping layers, so fmt.Errorf("%w", permanentErr) is correctly detected.
If your error type knows how long the caller should wait (e.g. a rate-limit response), implement the RetryAfterer interface and try will use that duration instead of the computed backoff:
type RateLimitError struct {
RetryIn time.Duration
}
func (e RateLimitError) Error() string { return "rate limited" }
func (e RateLimitError) RetryAfter() time.Duration { return e.RetryIn }The duration is still capped at MaxDelay.
The exponential cap for attempt n is min(MaxDelay, InitialDelay × 2^(n−1)). The jitter strategy then derives the actual wait from that cap:
| Strategy | Formula | Behaviour |
|---|---|---|
FullJitter (default) |
rand[0, cap) |
Maximally spreads retriers; may produce very short waits |
EqualJitter |
cap/2 + rand[0, cap/2) |
Guarantees at least half the backoff; softer lower bound |
Both strategies enforce a 1ms minimum floor. The Full Jitter approach is recommended by AWS for avoiding thundering herd; Equal Jitter is preferable when a minimum wait time matters.
By default the jitter window equals the full backoff cap, so FullJitter with
a 30s cap draws delays from the entire [0, 30s) range. Use WithMaxJitter
to constrain the spread independently — useful for services where you want long
base delays but minimal pile-up variance:
// Base delay grows to 30s, but jitter is capped at 500ms.
// Delays will be in [0, 500ms) regardless of how large the backoff has grown.
try.Do(ctx, fn,
try.WithInitialDelay(1*time.Second),
try.WithMaxDelay(30*time.Second),
try.WithMaxJitter(500*time.Millisecond),
)When MaxJitter is larger than the current backoff cap it has no effect —
the backoff cap is always the effective ceiling.
WithDelayFunc replaces the built-in exponential backoff with any schedule
you need. The function receives the 1-based attempt number and the error that
caused the failure:
// Fixed delay — no backoff at all.
try.WithDelayFunc(func(attempt int, err error) time.Duration {
return 500 * time.Millisecond
})
// Linear backoff: 1s, 2s, 3s, …
try.WithDelayFunc(func(attempt int, err error) time.Duration {
return time.Duration(attempt) * time.Second
})
// Error-dependent: long wait for throttle errors, short for others.
try.WithDelayFunc(func(attempt int, err error) time.Duration {
if errors.Is(err, ErrThrottled) {
return 10 * time.Second
}
return time.Duration(attempt) * 200 * time.Millisecond
})WithDelayFunc takes precedence over WithInitialDelay, WithMaxDelay, and
WithJitter. RetryAfterer on the
error still takes precedence over WithDelayFunc.
By default Do returns only the last attempt's error. Use WithAllErrors to
collect every attempt error into *AttemptErrors, which implements
Unwrap() []error for Go 1.20+ multi-error unwrapping. This lets you inspect
the full failure history with errors.Is and errors.As:
_, err := try.Do(ctx, fn,
try.WithAttempts(3),
try.WithAllErrors(),
)
var ae *try.AttemptErrors
if errors.As(err, &ae) {
for i, e := range ae.Unwrap() {
slog.Warn("attempt failed", "attempt", i+1, "error", e)
}
}
// errors.Is traverses all attempt errors, not just the last one.
if errors.Is(err, ErrRateLimit) {
// at least one attempt was rate-limited
}WithAllErrors is opt-in — the default behaviour (last error only) is
unchanged and has no allocation overhead.
WithTimeout sets a deadline on each individual call to fn, distinct from the
parent context deadline which governs the entire retry operation. If fn blocks
longer than the timeout its context is cancelled and the attempt is retried:
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) // overall budget
defer cancel()
val, err := try.Do(ctx, callSlowService,
try.WithAttempts(5),
try.WithTimeout(2*time.Second), // each attempt gets 2s
try.WithRetryIf(func(err error) bool {
// Retry per-attempt timeouts; stop on other errors.
return errors.Is(err, context.DeadlineExceeded)
}),
)The parent context deadline still governs the overall operation — if the parent is cancelled mid-retry the loop stops immediately.
WithAttemptsForError sets an independent retry cap for a specific error value.
When that error is returned and its budget is exhausted, the loop stops immediately
— even if the global WithAttempts budget has remaining attempts.
var ErrRateLimit = errors.New("rate limited")
var ErrUnavailable = errors.New("service unavailable")
val, err := try.Do(ctx, fn,
try.WithAttempts(10),
try.WithAttemptsForError(2, ErrRateLimit), // stop after 2 rate-limit hits
try.WithAttemptsForError(3, ErrUnavailable), // stop after 3 unavailable hits
)Multiple WithAttemptsForError calls accumulate independent budgets. Matching
uses errors.Is, so wrapped errors are detected correctly:
// This will match ErrRateLimit even through fmt.Errorf wrapping.
return 0, fmt.Errorf("upstream: %w", ErrRateLimit)Because Do retries all errors by default, use WithRetryIf to restrict retries
to genuinely transient failures in production code:
// HTTP example: only retry on 5xx or network errors, never on 4xx.
val, err := try.Do(ctx, fetchUser,
try.WithRetryIf(func(err error) bool {
var httpErr *HTTPError
if errors.As(err, &httpErr) {
return httpErr.StatusCode >= 500
}
return true // retry network/timeout errors
}),
)// gRPC example: retry on Unavailable and DeadlineExceeded, not on
// InvalidArgument, NotFound, PermissionDenied, etc.
try.WithRetryIf(func(err error) bool {
switch status.Code(err) {
case codes.Unavailable, codes.ResourceExhausted:
return true
default:
return false
}
})Errors that should never be retried: validation failures, authentication errors,
not-found responses, and any error that will produce the same result on every attempt.
Wrap these with Permanent or filter them out via
WithRetryIf to fail fast and avoid unnecessary load on downstream services.
Runnable examples for all major features are in example_test.go
and render on pkg.go.dev. They cover:
ExampleDo— minimal zero-config usageExampleDo_transientFailure— flaky call withWithRetryIfpredicateExampleDo_permanentError— early exit withPermanentExampleDo_onRetry— structured logging viaWithOnRetryExampleDo_retryAfter— honouringRetryAftereron rate-limit errorsExampleDo_equalJitter—EqualJitterwithWithMaxDelayExamplePermanent—errors.Isthrough thePermanentwrapper
Pass a testClock via WithClock to control time without real sleeps:
type testClock struct {
ch chan time.Time
}
func (c *testClock) After(d time.Duration) <-chan time.Time { return c.ch }
func (c *testClock) Now() time.Time { return time.Now() }
clk := &testClock{ch: make(chan time.Time)}
go func() {
_, _ = try.Do(ctx, alwaysFails, try.WithClock(clk), try.WithAttempts(3))
}()
clk.ch <- time.Now() // advance past first wait instantlyMIT