Skip to content

fix: use pop() instead of del for job_tasks cleanup in run_job#521

Open
trionnis wants to merge 1 commit intopython-arq:mainfrom
trionnis:patch-1
Open

fix: use pop() instead of del for job_tasks cleanup in run_job#521
trionnis wants to merge 1 commit intopython-arq:mainfrom
trionnis:patch-1

Conversation

@trionnis
Copy link

@trionnis trionnis commented Mar 9, 2026

The finally block in Worker.run_job() uses del self.job_tasks[job_id] (line 604) which raises KeyError if the entry was already removed. This crashes the entire worker process, killing all concurrent in-flight jobs.

The bug

# worker.py, run_job():
try:
    self.job_tasks[job_id] = task = self.loop.create_task(...)
    try:
        result = await asyncio.wait_for(task, timeout_s)
    except (Exception, asyncio.CancelledError) as e:
        ...
        raise
    else:
        ...
    finally:
        del self.job_tasks[job_id]  # ← KeyError here crashes the worker

except (Exception, asyncio.CancelledError) as e:
    ...

The finally block always runs — even when the outer except catches an exception. If the job_id key was already removed from self.job_tasks
(e.g. during cancellation handling), the del raises an unhandled KeyError that propagates up and kills the worker process. With max_jobs > 1, this takes down all concurrent tasks on that worker, not just the one that triggered the error.

Traceback

File "arq/worker.py", line 604, in run_job
    del self.job_tasks[job_id]
KeyError: 'metadata:01e23ac5-164f-4fd0-89da-2269220d839c:62983e7d'

Fix

Replace del self.job_tasks[job_id] with self.job_tasks.pop(job_id, None). This is a no-op if the key is already gone, and identical behavior otherwise. Cleanup code in finally blocks should be idempotent.

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #521      +/-   ##
==========================================
- Coverage   96.27%   96.11%   -0.17%     
==========================================
  Files          11       11              
  Lines        1074     1081       +7     
  Branches      209      144      -65     
==========================================
+ Hits         1034     1039       +5     
- Misses         19       21       +2     
  Partials       21       21              
Files with missing lines Coverage Δ
arq/worker.py 97.37% <100.00%> (+0.20%) ⬆️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fda407c...d215f22. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants