Survive minor redis connection drops#457
Conversation
feb876b to
ee9b565
Compare
c5e9e01 to
8859c92
Compare
If an exception was raised when trying to acquire to lock, the process would end up in a half-dead state where it would stay "up", but would not attempt to acquire the lock again when the connection came back up.
8859c92 to
c7ca2aa
Compare
waiiit, what? In a test where there's only one orchestrator running? |
|
Guess a SIGTERM wasn't enough |
ofedoren
left a comment
There was a problem hiding this comment.
Thanks, @adamruzicka, LGTM, just one probably not related question, but since we're at redis/sidekiq world...
I've just noticed that dynflow uses gitlab-sidekiq-fetcher, which seems to be no longer maintained for others: https://gitlab.com/gitlab-org/ruby/gems/sidekiq-reliable-fetch#this-gem-is-no-longer-updated. Is it something that affects us aside from using gem that is not maintained anymore? Note that this gem locks used version of sidekiq to be ~> 6.1, thus it might need some refactoring if we'll end up using latest sidekiq/redis versions since we don't seem to have any locks on these.
It is holding us back from moving to newer sidekiq, but nothing else apart from that. I have some ideas how we could get out of this, but resolving this situation isn't an immediate priority right now. |
|
Thank you @ofedoren ! |
If an exception was raised when trying to acquire to lock, the process would end up in a half-dead state where it would stay "up", but would not attempt to acquire the lock again when the connection came back up.
How to test this?
Setup
podman run -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=dynflowpodman run -p 6379:6379 redis:6The actual test
Expected results:
Things may appear stuck until orchestrator 1's lock expires (up to 1 minute), after that orchestrator 2 should kick in.
Includes #462