feat: halt system on significant child process failure#485
feat: halt system on significant child process failure#485Draggu wants to merge 3 commits intoelixir-lang:mainfrom
Conversation
apps/engine/mix.exs
Outdated
| {:forge, path: "../forge"}, | ||
| {:gen_lsp, "~> 0.11.3"}, | ||
| {:gen_lsp, | ||
| git: "https://github.com/elixir-tools/gen_lsp", |
There was a problem hiding this comment.
Tip: you can use the github option and branch option together, a little more ergonomic and easier to update when you push changes.
| receive do | ||
| {:DOWN, ^ref, :process, _, _} -> | ||
| Logger.error("Expert application has terminated unexpectedly") | ||
| # Signifficant process we cant recover (like the GenLSP) has crashed |
There was a problem hiding this comment.
typos
| # Signifficant process we cant recover (like the GenLSP) has crashed | |
| # A significant process we can't recover from (like the GenLSP) has crashed. |
|
testing this today |
mhanberg
left a comment
There was a problem hiding this comment.
In testing this, I may have found a bug in the neovim lsp client. when you close it, it seems to send the shutdown request multiple times (which we handle gracefully, but was pretty confusing when encountering).
| {:ok, pid} -> | ||
| spawn(fn -> | ||
| ref = Process.monitor(pid) | ||
|
|
||
| receive do | ||
| {:DOWN, ^ref, :process, _, _} -> | ||
| Logger.error("Expert application has terminated unexpectedly") | ||
| # Signifficant process we cant recover (like the GenLSP) has crashed | ||
| :timer.apply_after(50, System, :halt, [1]) | ||
| end | ||
| end) |
There was a problem hiding this comment.
I don't think this is the right approach. When i was playing aroudn with this locally, i was never seeing the log hit the system, probably because it was not getting handled before we halted the system.
I think the better option is to implement the c:stop callback on the application and halt the system from there when the application shuts down (which it will when we shutdown the supervisor, which shuts down from the significant process dying).
there is one issue tho, from the perspective of that callback, we can't really know if the application was terminated normally or due to a crash. This same callback is executed when the client sends the exit notification, since we call System.stop there, which gracefully shuts down the system. I think perhaps what we want to do is set a piece of global state (persistent_term maybe) if we exit on purpose, and check that in the callback. if its present, then we can just let it continue to shut down, else we can halt with exit code 1.
This method uses existing otp conventions more or less, whereas the approach in this pr i have never seen before (which is not to say its incorrect, it just is possibly non-idiomatic).
It would ideally be nice if we could list an application as significant, and if it shutdown, the whole beam should terminate. I had sort of assumed that was possible, but doesn't seem like it.
| {:engine, path: "../engine", only: [:test]}, | ||
| {:forge, path: "../forge"}, | ||
| {:gen_lsp, "~> 0.11.3"}, | ||
| {:gen_lsp, github: "Draggu/gen_lsp", branch: "restart"}, |
There was a problem hiding this comment.
instead of making changes to gen_lsp, lets just override the child spec in the application child list. it might be better in the long run, but we can make that change if this seems to work
gen_lsppart elixir-tools/gen_lsp#80Close #428, I tried to fix the source problem here (writing to stdout), but I'm not sure if case I found is the only one. I don't think there exists a valid restart option since clients can't recover from gibberish written to output.