-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RR: Config knobs and validation #12375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,6 +99,28 @@ impl core::hash::Hash for ModuleVersionStrategy { | |
| } | ||
| } | ||
|
|
||
| impl ModuleVersionStrategy { | ||
| /// Get the string-encoding version of the module. | ||
| pub fn as_str(&self) -> &str { | ||
| match &self { | ||
| Self::WasmtimeVersion => env!("CARGO_PKG_VERSION_MAJOR"), | ||
| Self::Custom(c) => c, | ||
| Self::None => "", | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Configuration for record/replay | ||
| #[derive(Clone)] | ||
| pub enum RRConfig { | ||
| /// Recording on store is enabled | ||
| Recording, | ||
| /// Replaying on store is enabled | ||
| Replaying, | ||
| /// No record/replay is enabled | ||
| None, | ||
| } | ||
|
|
||
| /// Global configuration options used to create an [`Engine`](crate::Engine) | ||
| /// and customize its behavior. | ||
| /// | ||
|
|
@@ -165,6 +187,8 @@ pub struct Config { | |
| pub(crate) detect_host_feature: Option<fn(&str) -> Option<bool>>, | ||
| pub(crate) x86_float_abi_ok: Option<bool>, | ||
| pub(crate) shared_memory: bool, | ||
| #[cfg(feature = "rr")] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've got more documentation online about this, but in the interest of shuffling things around and minimizing the impact of this one thought I might have here is:
That would mean that when the Cargo features is disabled
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We would still need the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming you mean unconditionally and that's a typo, true! |
||
| pub(crate) rr_config: RRConfig, | ||
| } | ||
|
|
||
| /// User-provided configuration for the compiler. | ||
|
|
@@ -275,6 +299,8 @@ impl Config { | |
| detect_host_feature: None, | ||
| x86_float_abi_ok: None, | ||
| shared_memory: false, | ||
| #[cfg(feature = "rr")] | ||
| rr_config: RRConfig::None, | ||
| }; | ||
| ret.wasm_backtrace_details(WasmBacktraceDetails::Environment); | ||
| ret | ||
|
|
@@ -2407,7 +2433,7 @@ impl Config { | |
| target_lexicon::Triple::host() | ||
| } | ||
|
|
||
| pub(crate) fn validate(&self) -> Result<(Tunables, WasmFeatures)> { | ||
| pub(crate) fn validate(&mut self) -> Result<(Tunables, WasmFeatures)> { | ||
| let features = self.features(); | ||
|
|
||
| // First validate that the selected compiler backend and configuration | ||
|
|
@@ -2448,6 +2474,15 @@ impl Config { | |
| bail!("exceptions support requires garbage collection (GC) to be enabled in the build"); | ||
| } | ||
|
|
||
| #[cfg(feature = "rr")] | ||
| match &self.rr_config { | ||
| RRConfig::Recording | RRConfig::Replaying => { | ||
| self.validate_determinism_conflicts()?; | ||
| self.enforce_determinism(); | ||
|
Comment on lines
+2480
to
+2481
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would you think of ignoring the determinism options here instead of validating/enabling them? Historically as various config options have been tied together it's caused problems and made configuration more confusing due to trying to understand how everything interacts with each other. In some sense producing a recording is entirely orthogonal to deterministic simd/nans. In another sense I also understand how such a recording runs the risk of not being too useful. For the engine-level configuration I'd argue, however, that this is best kept as a separate concern where we'd document in the RR configuration options that users probably also want to turn on deterministic things, but it wouldn't be a requirement. Such a change would also have the nice benefit of keeping
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'd gently push back against this: determinism is a fundamental part of record/replay, not an optional add-on, and without it the replay may not only be incorrect but be incorrect in interesting ways that violate internal invariants (finding the wrong kind of event as we read the trace alongside canonical ABI steps, necessitating some sort of fallback that, I don't know, returns an error? aborts halfway through allocating something? forces a trap halfway through the marshalling code?). I'd personally see this in the same light as, e.g., the need for bounds checks when memory is configured a certain way: it's Just How We Compile Things.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't comment much on the intended direction of
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear I'm not saying that things should be deterministic, I'm saying that I think it would make sense to avoid unconditionally coupling them here. The two options handled here, NaN bits and relaxed-simd, are pretty low down on the list of "things that practically cause nondeterminism" in wasm with resource exhaustion (stacks or memory) being much higher on that list. RR cannot control resource exhaustion during replay in the sense that it can't necessarily predict the stack consumption (maybe memory? that seems relatively advanced) Basically I would expect that divergence of a replay from the original recording is something that's going to need to be handled no matter what. I think it'd be reasonable, for example, for the CLI to automatically set these options but at the Put another way, I would be surprised if we could actually achieve absolutely perfect determinism during a record and replay. Inevitably it seems like we'll forget events, have bugs that prevent this, etc. Assuming perfect determinism to me sounds like it's going to introduce more subtle bugs than not and be a pretty steep uphill battle
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, I think it's an interesting challenge to think about permitting partial divergence and recovering. However I also think, having seen and thought through a lot of the challenges here, that is enormously complex and opens a huge new Pandora's box of issues. For example, with reversible debugging, which builds on top of replay, the whole algorithm depends on determinism; we'll back ourselves into fundamentally unsolvable corners if we don't have it. See also e.g. how rr (the Mozilla project) panics with internal asserts if trace replay mismatches. I think that's the only really reasonable way to go here: we'll have asserts when we have mismatches. In other words, yep there may be bugs; let's treat them as bugs and catch and fix them. Resource exhaustion is of course a different category: early termination because a memory.grow failed on replay is reasonable to propagate through and we already have the error paths for that. The kind of nondeterminism that is impossible to deal with is the kind that keeps running but with a poisoned machine state.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I definitely don't meant to give the impression that we should somehow recover a diverging replay. I would also expect a panic, error, or something like that. My assumption is that such infrastructure is a given, and with that whether or not NaN and relaxed-simd is deterministic is not a critical property requires for correctness, but instead a strong recommendation and what tooling should enable by default.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another possible option here, which we've done elsewhere, is to change defaults depending on configuration options. For example the default set of enabled wasm features is different if you use Winch than if you use Cranelift. One way to slice this problem, without mutating |
||
| } | ||
| _ => {} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rust-idiom-wise we try to prefer exhaustive matches where possible unless it's unnecessarily onerous. In this case this'd be a good place to change to |
||
| }; | ||
|
|
||
| let mut tunables = Tunables::default_for_target(&self.compiler_target())?; | ||
|
|
||
| // If no target is explicitly specified then further refine `tunables` | ||
|
|
@@ -3008,6 +3043,53 @@ impl Config { | |
| self.shared_memory = enable; | ||
| self | ||
| } | ||
|
|
||
| /// Enforce deterministic execution configurations. Currently, this means the following: | ||
| /// * Enabling NaN canonicalization with [`Config::cranelift_nan_canonicalization`]. | ||
| /// * Enabling deterministic relaxed SIMD with [`Config::relaxed_simd_deterministic`]. | ||
| #[inline] | ||
| pub fn enforce_determinism(&mut self) -> &mut Self { | ||
| #[cfg(any(feature = "cranelift", feature = "winch"))] | ||
| self.cranelift_nan_canonicalization(true); | ||
| self.relaxed_simd_deterministic(true); | ||
| self | ||
| } | ||
|
Comment on lines
+3047
to
+3056
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I'm a bit wary of having this configuration method/convenience. We've got some documented known sources of non-determinisim and these two options are only a subset of the possible sources of non-determinism. Given that, and coupled with my thoughts about validation of a config, I think this would be served well as documentation of the |
||
|
|
||
| /// Validate if the current configuration has conflicting overrides that prevent | ||
| /// execution determinism. Returns an error if a conflict exists. | ||
| /// | ||
| /// Note: Keep this in sync with [`Config::enforce_determinism`]. | ||
| #[inline] | ||
| #[cfg(feature = "rr")] | ||
| pub(crate) fn validate_determinism_conflicts(&self) -> Result<()> { | ||
| if let Some(v) = self.tunables.relaxed_simd_deterministic { | ||
| if v == false { | ||
| bail!("Relaxed deterministic SIMD cannot be disabled when determinism is enforced"); | ||
| } | ||
| } | ||
| #[cfg(any(feature = "cranelift", feature = "winch"))] | ||
| if let Some(v) = self | ||
| .compiler_config | ||
| .as_ref() | ||
| .and_then(|c| c.settings.get("enable_nan_canonicalization")) | ||
| { | ||
| if v != "true" { | ||
| bail!("NaN canonicalization cannot be disabled when determinism is enforced"); | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| /// Enable execution trace recording or replaying to the configuration. | ||
| /// | ||
| /// When either recording/replaying are enabled, determinism is implicitly | ||
| /// enforced (see [`Config::enforce_determinism`] for details). | ||
| #[cfg(feature = "rr")] | ||
| #[inline] | ||
| pub fn rr(&mut self, cfg: RRConfig) -> &mut Self { | ||
| self.rr_config = cfg; | ||
| self | ||
| } | ||
| } | ||
|
|
||
| impl Default for Config { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,6 @@ | ||
| use crate::Config; | ||
| #[cfg(feature = "rr")] | ||
| use crate::RRConfig; | ||
| use crate::prelude::*; | ||
| #[cfg(feature = "runtime")] | ||
| pub use crate::runtime::code_memory::CustomCodeMemory; | ||
|
|
@@ -95,7 +97,7 @@ impl Engine { | |
| /// the compiler setting `unwind_info` to `true`, but explicitly | ||
| /// disable these two compiler settings will cause errors. | ||
| pub fn new(config: &Config) -> Result<Engine> { | ||
| let config = config.clone(); | ||
| let mut config = config.clone(); | ||
| let (mut tunables, features) = config.validate()?; | ||
|
|
||
| #[cfg(feature = "runtime")] | ||
|
|
@@ -258,6 +260,26 @@ impl Engine { | |
| self.config().async_support | ||
| } | ||
|
|
||
| /// Returns whether the engine is configured to support execution recording | ||
| #[cfg(feature = "rr")] | ||
| #[inline] | ||
| pub fn is_recording(&self) -> bool { | ||
| match self.config().rr_config { | ||
| RRConfig::Recording => true, | ||
| _ => false, | ||
| } | ||
| } | ||
|
|
||
| /// Returns whether the engine is configured to support execution replaying | ||
| #[cfg(feature = "rr")] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I'd say it's fine to. unconditionally define these methods and have them just return |
||
| #[inline] | ||
| pub fn is_replaying(&self) -> bool { | ||
| match self.config().rr_config { | ||
| RRConfig::Replaying => true, | ||
| _ => false, | ||
| } | ||
| } | ||
|
|
||
| /// Detects whether the bytes provided are a precompiled object produced by | ||
| /// Wasmtime. | ||
| /// | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops forgot from the previous review -- mind adding handling of these options in this crate such that if the option is specified an error is returned? A future PR would then change it to calling a
Configmethod or such as appropriate, but it'll avoid a state where we have these options but they don't do anything