Add e2e test for kernel recovery after external kill#13694
Conversation
Verifies that Positron detects an unexpected kernel death (simulating OOM kill via SIGKILL) and allows the user to restart and resume work. Closes #12869
|
E2E Tests 🚀 |
|
I wonder if this might be too much of an edge case right now for an e2e test? Anyone else have thoughts? |
I would agree. Calling kill from a test just seems rather extreme. |
|
The test here is trying to mimick a realistic failure when the kernel is unexpectedly terminated e.g., because it was OOM-killed (for notebooks, that could happen due to large datasets, long running processes, etc.). In that situation, I was thinking how we could best and simply test Positron’s recovery behavior (and calling kill is not the test itself, just a means of triggering the recovery behavior). The question this test tries to answer is: does Positron detect that the kernel died, leave the notebook in a clear state, and allow the user to restart and continue working? I thought this was an important notebook coverage gap, especially for users working with large datasets or long-running notebook sessions. Do we currently cover unexpected kernel termination and recovery anywhere else? |
I guess it is technically not covered yet. If you feel strongly that it is worthwhile, please run a very complete full suite run (covering all OSs and types) and make sure it is not an issue. |
|
Thanks for digging into this, Rodrigo! I do want to push back on the placement of this test and suggest a better home. Some of the concerns with this here are:
Instead, we could take a different approach! The behavior actually being validated here is that Kallichore detects an unexpected child-process death and propagates A Kallichore-side test that:
would cover the same intent at a fraction of the runtime, with deterministic event-based assertions instead of UI text parsing, and on the layer where a regression would actually originate. It would also fit naturally next to For the remote SSH angle from the original issue, that genuinely does want an integration-level test, but it'd need to involve a forwarded port and either a real or stubbed SSH transport, which is a different scope from this PR. How about we set up some plans for a more appropriate test approach here? |
Summary
Plan
@:positron-notebooks