Make containers come back at host restart#127
Conversation
There was a problem hiding this comment.
Yeah seems reasonable, though for the HL7-reader we had wanted it to stay dead rather than endlessly restarting if it encounters an unexpected error in processing an hl7 message (as it will keep on trying to process that same message again and again). That way informus dashboard will find that the hl7 reader is down and notify us that its not processing.
Not too strongly held and opinion as its very rare now, so happy to see what happens with this
Perhaps we should use on-failure:5 to avoid that particular problem, even if it doesn't solve the reboot problem either way. |
|
I think in an ideal world we'd use a different restart policy for hl7-reader/hoover depending on whether they're operating in indefinite mode or not. And for monitoring we'd use the lack of progress and/or the presence of an error in hl7-reader instead of it being down to signal an error. |
Tweak container restart policies to increase their chances of coming back if the host restarts or the container crashes.
When our docker host last rebooted, only cassandra, core and glowroot came back for emap-dev. In particular, the waveform-reader didn't come back, and unknown to me there was some data directing to it that we wanted to keep.
So trying to prevent this happening again.
I can't explain why core came back but rabbitmq didn't, since they both had "on-failure" before this change. And more containers depend on rabbitmq than core, so I don't think that's the cause.
I'm aware that some containers need to be able to exit cleanly without being restarted (hl7-reader and hoover), so nothing more aggressive than "on-failure" can be used in that case. This could be a problem because the docs say:
We certainly want hl7-reader to come back following a docker daemon (or host) restart, so this may require further work.