Skip to content

Relax an eager sled agent assert#10401

Open
jmpesp wants to merge 1 commit intooxidecomputer:mainfrom
jmpesp:relax_sled_agent_assert
Open

Relax an eager sled agent assert#10401
jmpesp wants to merge 1 commit intooxidecomputer:mainfrom
jmpesp:relax_sled_agent_assert

Conversation

@jmpesp
Copy link
Copy Markdown
Contributor

@jmpesp jmpesp commented May 7, 2026

omicron-stress was able to trigger a sled-agent panic, and after asking for some pointers from a former very esteemed colleague, we determined that this assert was overly eager: there always was an unwrap that would enforce the same condition. Remove this assert and add a comment.

Fixes #10369

`omicron-stress` was able to trigger a sled-agent panic, and after
asking for some pointers from a former very esteemed colleague, we
determined that this assert was overly eager: there always was an unwrap
that would enforce the same condition. Remove this assert and add a
comment.

Fixes oxidecomputer#10369
@jmpesp jmpesp requested a review from hawkw May 7, 2026 14:19
Copy link
Copy Markdown
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me at a glance. I wonder if we might be able to put together a test for sled-agent that reproduces the omicron-stress behavior? It would be nice to be able to have a regression test ensuring we're not just going to end up panicking later at the unwrap...

// If `propolis_request` is Some leaving the above match group, there
// must be a Some `self.running_state`, i.e. if there is a request to
// send to the propolis process, there must be a propolis process.
// `propolis_state_put` unwraps `self.running_state` and will panic the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the unwrap have a useful error message, and if not, should it be changed to an expect() with the message from the old assertion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah:

        let res = self
            .running_state
            .as_ref()
            .expect("Propolis client should be initialized before usage")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, looks good to me then!

@hawkw
Copy link
Copy Markdown
Member

hawkw commented May 7, 2026

The helios / deploy failure looks like the switch zone failing to come up, which I can't easily imagine a way that that might be related by this change...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

omicron-stress can trigger sled-agent panic: "should have an active Propolis zone by now"

2 participants