-
-
Notifications
You must be signed in to change notification settings - Fork 80
fix(worker): fix pause/resume functionality #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(worker): fix pause/resume functionality #651
Conversation
…ext` but does not have any pending tasks
The `is_shutting_down()` method incorrectly returned true for any non-Running state, including Paused workers. This caused: - `resume()` to fail with ShuttingDown error for paused workers - Paused workers with no in-flight tasks to exit immediately (zombie state) Now `is_shutting_down()` only returns true when: - State is explicitly Stopped (stop() was called) - OR shutdown signal has been triggered Added comprehensive tests for shutdown behavior across all worker states.
ReadinessService::poll_ready() only checked is_shutting_down() but not is_paused(), so paused workers would continue processing new jobs. Now poll_ready() returns Poll::Pending when the worker is paused, blocking new job processing until resume() is called. Added unit tests for ReadinessService verifying: - Running workers are ready to process jobs - Paused workers block job processing (return Pending) - Shutting down workers block job processing - Resumed workers are ready again Also fixed clippy warnings for format string inlining.
geofmureithi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- See minor changes in the API
- Rethink both tests including
readiness_service_tests - Add a changelog entry it will be needed for workflows to be successful
|
@Himmelschmidt Let me know if you need some help |
|
I should be able to do it, seems minor enough, I just have to find the time to actually do it, might take a day or two |
- Add wake() call in resume() to re-poll the worker after unpausing - Store waker in ReadinessService::poll_ready when paused so resume can wake it - Add public is_stopped() method for explicit stopped state checking - Refactor is_shutting_down() to use is_stopped() - Rewrite tests to use real workers with run_with_ctx
geofmureithi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor the tests as requested. One worker test should be enough.
Also elaborate why you are changing the waker. That should only be done in WorkerContext::poll unless I am missing something.
|
FYI: Sorry for the strict approach on this one, it affects core parts of the project. |
geofmureithi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some few minor things, and a question about replacing the waker.
geofmureithi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for this contribution!
Summary
Fixes #650
Root cause:
is_shutting_down()returned true for paused workers (checked!self.is_running()), causing:resume()to fail withShuttingDownerror when in-flight tasks existedFix:
is_shutting_down()now only returns true when state isStoppedor shutdown signal is triggeredReadinessService::poll_ready()now explicitly checksis_paused()to block new job processingAlso suppresses spurious error log when dropping
WorkerContextwith no pending tasks.