feat: introduce the recoverable crate #18

martintmk · 2025-09-29T13:21:45Z

Add a new recoverable crate that provides standardized types for classifying
error conditions as recoverable or non-recoverable, enabling consistent retry
behavior across different error types and resilience middleware.

Core features:

RecoveryInfo type for classifying errors with recovery metadata
Recoverable trait for types that can determine their recoverability
RecoveryKind enum distinguishing between retry, outage, never, and unknown
Support for explicit retry delays via delay() method
Service outage detection with optional recovery hints

codecov · 2025-09-29T13:22:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (04bc1b7) to head (fd326dd).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #18   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files          90       91    +1     
  Lines        6497     6554   +57     
=======================================
+ Hits         6497     6554   +57

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

README.md

crates/recoverable/Cargo.toml

crates/recoverable/README.md

crates/recoverable/src/lib.rs

Copilot

Pull Request Overview

This PR introduces a new recoverable crate that provides standardized types for error classification and recovery behavior in resilience patterns. The crate enables consistent determination of whether conditions are recoverable (transient) or non-recoverable (permanent/successful).

Core recovery classification system with Recovery, RecoveryKind, and Recover trait
Support for retry timing hints through delay metadata
Service outage detection capabilities for widespread failures
Comprehensive documentation and examples for proper usage patterns

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
crates/recoverable/src/lib.rs	Main implementation with Recovery struct, RecoveryKind enum, Recover trait, and comprehensive test suite
crates/recoverable/Cargo.toml	Package configuration for the new recoverable crate
crates/recoverable/README.md	Documentation and usage examples for the crate
crates/recoverable/CHANGELOG.md	Empty changelog file for future version tracking
crates/recoverable/logo.png	Git LFS tracked logo image for the crate
Cargo.toml	Workspace updates to include recoverable crate and static_assertions dependency
README.md	Updated main README to reference the new recoverable crate
CHANGELOG.md	Updated main changelog to reference recoverable's changelog

crates/recoverable/src/lib.rs

geeknoid · 2025-10-01T16:29:14Z

I think the names of the types aren't quite right English-wise. I would recommend the following:

RecoveryMetadata - type for classifying errors with recovery metadata
RecoveryAware - trait for types that can determine their recoverability
RecoveryKind enum distinguishing between retry, outage, never, and unknown

As a first step. And then, I'm still not sure why the retry delay can't be incorporated directly into the RecoveryKind enum which would eliminate a separate type. You could add a delay function for this enum which would return an updated RecoveryKind with the given delay, so it would match the semantics that the current delay function has. So what scenario does this complicate?

martintmk · 2025-10-02T07:28:39Z

I think the names of the types aren't quite right English-wise. I would recommend the following:

RecoveryMetadata - type for classifying errors with recovery metadata RecoveryAware - trait for types that can determine their recoverability RecoveryKind enum distinguishing between retry, outage, never, and unknown

Regarding the type names, I brainstormed these a while back with @ralfbiedert. I feel that RecoveryMetadata and RecoveryAware break the M-CONCISE-NAMES guideline.

As a first step. And then, I'm still not sure why the retry delay can't be incorporated directly into the RecoveryKind enum which would eliminate a separate type. You could add a delay function for this enum which would return an updated RecoveryKind with the given delay, so it would match the semantics that the current delay function has. So what scenario does this complicate?

This is what I had in previous iteration and it introduced friction. For example, one komponent evaluates the recovery kind, while the other extracts the retry-after header and updates existing metadata.

let mut recovery = detect_recovery(...);
// ...
// little later 
if let Some(delay) = extract_delay(...) {
   recovery = recovery.delay(delay);
}

In addition, I plan to introduce a reason property, so the recovery metadata cannot be simply expressed through enums because many additional props can apply across all members. That's why I started thinking about metadata simply as some dumb bag of properties where producer of these can fill any combinations of these.

Important part, that most consumers won't care about, is evaluation of recovery metadata. This will be done mostly once by individual resilience middleware. Here, the inspection is flattened out, and middleware looks at each property individually. (so check RecoveryKind first, then if it evaluates to a retry it looks the delay property, then reason if necessary, etc.)

I tried the current simplified model in internal project and it indeed simplifies how the recovery metadata are consumed. Of course, this can still change as we will go through more usage patterns. But currently, I would like to try this latest approach.

geeknoid · 2025-10-02T15:05:54Z

Well, even if you think my suggested names are too long, the existing names aren't right.

"Recover" is a verb, it's not appropriate as a trait name.
"Recovery" is an event and is not meaningful as a "carrier of state".

martintmk · 2025-10-03T08:32:23Z

What about:

Recover -> Recoverable (declares capability)
Recovery -> RecoveryInfo (cleaner, as a bag of properties, feels nicer than RecoveryMetadata)

geeknoid · 2025-10-03T10:44:32Z

Recover -> Recoverable (declares capability)
Recovery -> RecoveryInfo (cleaner, as a bag of properties, feels nicer than RecoveryMetadata)

Yeah, that works :-)

Cargo.toml

ralfbiedert · 2025-10-05T11:30:08Z

"Recover" is a verb, it's not appropriate as a trait name.

That's not true, there are several lenses to look at trait naming, compare API Guidelines:

Linguistically:

Imperatives like io::Write, fmt::Debug, clone::Clone
Agent nouns like iter::Iterator, hash::Hasher
Nouns like fmt::Binary, ops::Fn
Adjectives like marker::Sized, panic::UnwindSafe
Preposition like convert::Into, borrow::ToOwned

Functionally:

If the trait has a single self-explanatory method (or a set of nearly identical methods), name it after the method: Clone, Hash, Default, Into, Write, ToOwned, AsRef, Extend.

If the trait has no methods, name it after what ability (Send, Sync, Copy) or property (UnwindSafe) its implementors have.

If the trait has a broader set of methods or a single method that is not self-explanatory, the name should describe either what its implementors are (Iterator, Hasher, Fn, Error, Future, Termination) or what ability/property they have (AsciiExt, fmt::Debug, fmt::Binary).

However, as a universal convention trait names are generally short, unless compound IntoIter, UnwindSafe. Many of our trait names have unfortunately deviated from that and more leaned into C# interfaces naming style , and as we move here I'd like to revert that.

With that said I agree Recover, while being imperative, does not match the "single method" convention, so given the method is called recovery, I'd like the trait to be named Recovery as well. since that matches the "Noun + single method" lens (and removes the -able that doesn't feel idiomatic.

About RecoveryInfo I don't have strong feelings.

crates/recoverable/Cargo.toml

crates/recoverable/README.md

ralfbiedert · 2025-10-05T11:30:57Z

crates/recoverable/examples/recoverable_error.rs

+    ServiceUnavailable { retry_after: Option<Duration> },
+}
+
+impl Recoverable for NetworkError {


Should be Recovery.

@geeknoid Would you agree with the rename?

No, I don't think it's right.

This doesn't do anything about recovery, it merely says that an error can be recovered from. When I see "recovery", I think "action", this IS the recovery to the problem.

This trait only tells that particular error or result is recoverable but does not nor it is able to do the recovery action.

It's up to the caller to do the action itself based on the conditions.

For example, caller might decide ahead of time that particular action can be retried and he needs to put aside all information/parameters required to do that recovery action. This trait only gives the information that such recovery is possible.

Our names should reflect that. With this context in mind, I find the Recoverable more appropriate than Recovery.
(capability vs action)

github-actions · 2025-12-30T14:22:02Z

⚠️ Breaking Changes Detected

error: failed to retrieve local crate data from git revision

Caused by:
    0: failed to retrieve manifest file from git revision source
    1: possibly due to errors: [
         failed when reading /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/c115bd2b3bb60b57a849f06eafbecc79bf0c7aee/scripts/crate-template/Cargo.toml: TOML parse error at line 9, column 26
         |
       9 | keywords = ["oxidizer", {{CRATE_KEYWORDS}}]
         |                          ^
       missing key for inline table element, expected key
       : TOML parse error at line 9, column 26
         |
       9 | keywords = ["oxidizer", {{CRATE_KEYWORDS}}]
         |                          ^
       missing key for inline table element, expected key
       ,
         failed to parse /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/c115bd2b3bb60b57a849f06eafbecc79bf0c7aee/Cargo.toml: no `package` table,
       ]
    2: package `recoverable` not found in /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/c115bd2b3bb60b57a849f06eafbecc79bf0c7aee

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: cargo_semver_checks::rustdoc_gen::RustdocFromProjectRoot::get_crate_source
   2: cargo_semver_checks::rustdoc_gen::StatefulRustdocGenerator<cargo_semver_checks::rustdoc_gen::CoupledState>::prepare_generator
   3: cargo_semver_checks::Check::check_release::{{closure}}
   4: cargo_semver_checks::Check::check_release
   5: cargo_semver_checks::exit_on_error
   6: cargo_semver_checks::main
   7: std::sys::backtrace::__rust_begin_short_backtrace
   8: main

If the breaking changes are intentional then everything is fine - this message is merely informative.

Remember to apply a version number bump with the correct severity when publishing a version with breaking changes (1.x.x -> 2.x.x or 0.1.x -> 0.2.x).

geeknoid · 2025-12-30T17:31:14Z

crates/recoverable/src/lib.rs

+//!
+//! The recovery information describes whether recovering from an operation might help, not whether
+//! the operation succeeded or failed. Both successful operations and permanent failures
+//! should use [`RecoveryInfo::never`][RecoveryInfo::never] since recovery won't change the outcome.


Suggested change

//! should use [`RecoveryInfo::never`][RecoveryInfo::never] since recovery won't change the outcome.

//! should use [`RecoveryInfo::never`] since recovery is not necessary or desirable.

geeknoid · 2025-12-30T17:34:28Z

crates/recoverable/src/lib.rs

+// conventions because setters are used much more frequently than getters in typical usage patterns.
+// The `get_` prefix on getters helps distinguish them from their corresponding setters.
+
+/// Represents the recovery information associated with an operation or condition.


Suggested change

/// Represents the recovery information associated with an operation or condition.

/// The recovery information associated with an operation or condition.

geeknoid · 2025-12-30T17:36:20Z

crates/recoverable/src/lib.rs

+/// let recovery = RecoveryInfo::retry();
+/// assert_eq!(recovery.kind(), RecoveryKind::Retry);
+/// ```
+#[derive(Debug, PartialEq, Clone, Eq)]


Would you ever want a hash table of HashMap<RecoveryInfo, xxx>? (Hash trait)

geeknoid · 2025-12-30T17:38:23Z

crates/recoverable/src/lib.rs

+///
+/// To retrieve the recovery kind from a `RecoveryInfo` instance, use the [`RecoveryInfo::kind`] method.
+///
+/// # Examples


I think we should have a recommendation of what code is expected to do in general when it receives these different kinds. In particular, what should code do/assume when it gets Unknown? And what should it do if it gets a kind that not currently in the enum (given the non_exhaustive attribute).

geeknoid · 2025-12-30T17:42:24Z

crates/recoverable/src/lib.rs

+#![doc(html_logo_url = "https://media.githubusercontent.com/media/microsoft/oxidizer/refs/heads/main/crates/recoverable/logo.png")]
+#![doc(html_favicon_url = "https://media.githubusercontent.com/media/microsoft/oxidizer/refs/heads/main/crates/recoverable/favicon.ico")]
+
+//! Recovery information and classification for resilience patterns.


There are some instances where a service fails and provides a hint to an alternate endpoint to connect to instead. Just like delay(), would it make sense to have a fallback(url) hint?

geeknoid · 2025-12-30T17:51:45Z

crates/recoverable/src/lib.rs

+/// assert_eq!(recovery.kind(), RecoveryKind::Retry);
+/// ```
+#[derive(Debug, PartialEq, Clone, Eq)]
+#[non_exhaustive]


Why do we need non_exhaustive here? It has no effect given the private fields.