Skip to content

Conversation

@Himmelschmidt
Copy link
Contributor

@Himmelschmidt Himmelschmidt commented Dec 19, 2025

Summary

Fixes #650

Root cause: is_shutting_down() returned true for paused workers (checked !self.is_running()), causing:

  • resume() to fail with ShuttingDown error when in-flight tasks existed
  • Workers with no in-flight tasks to exit entirely

Fix:

  • is_shutting_down() now only returns true when state is Stopped or shutdown signal is triggered
  • ReadinessService::poll_ready() now explicitly checks is_paused() to block new job processing

Also suppresses spurious error log when dropping WorkerContext with no pending tasks.

The `is_shutting_down()` method incorrectly returned true for any
non-Running state, including Paused workers. This caused:
- `resume()` to fail with ShuttingDown error for paused workers
- Paused workers with no in-flight tasks to exit immediately (zombie state)

Now `is_shutting_down()` only returns true when:
- State is explicitly Stopped (stop() was called)
- OR shutdown signal has been triggered

Added comprehensive tests for shutdown behavior across all worker states.
ReadinessService::poll_ready() only checked is_shutting_down() but not
is_paused(), so paused workers would continue processing new jobs.

Now poll_ready() returns Poll::Pending when the worker is paused,
blocking new job processing until resume() is called.

Added unit tests for ReadinessService verifying:
- Running workers are ready to process jobs
- Paused workers block job processing (return Pending)
- Shutting down workers block job processing
- Resumed workers are ready again

Also fixed clippy warnings for format string inlining.
Copy link
Member

@geofmureithi geofmureithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. See minor changes in the API
  2. Rethink both tests including readiness_service_tests
  3. Add a changelog entry it will be needed for workflows to be successful

@geofmureithi
Copy link
Member

@Himmelschmidt Let me know if you need some help

@Himmelschmidt
Copy link
Contributor Author

I should be able to do it, seems minor enough, I just have to find the time to actually do it, might take a day or two

- Add wake() call in resume() to re-poll the worker after unpausing
- Store waker in ReadinessService::poll_ready when paused so resume can wake it
- Add public is_stopped() method for explicit stopped state checking
- Refactor is_shutting_down() to use is_stopped()
- Rewrite tests to use real workers with run_with_ctx
Copy link
Member

@geofmureithi geofmureithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor the tests as requested. One worker test should be enough.
Also elaborate why you are changing the waker. That should only be done in WorkerContext::poll unless I am missing something.

@geofmureithi
Copy link
Member

FYI: Sorry for the strict approach on this one, it affects core parts of the project.

Copy link
Member

@geofmureithi geofmureithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some few minor things, and a question about replacing the waker.

Copy link
Member

@geofmureithi geofmureithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks for this contribution!

@geofmureithi geofmureithi merged commit 7cb8856 into apalis-dev:main Dec 22, 2025
8 of 9 checks passed
@Himmelschmidt Himmelschmidt deleted the fix/pause-stops-processing branch December 22, 2025 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle is_shutting_down() and pause correctly

2 participants