Fix callback server death on task overage #108

MrCreosote · 2025-08-08T23:48:35Z

If a caller attempts to run too many tasks on the callback server at once when run standalone, the JobRunner._watch loop will exit and the callback server will hang forever. This duplicates (for now) the job counting logic in the callback server itself so it doesn't submit jobs guaranteed to fail to the _watch loop. Eventually all this should likely be refactored.

Also catches and reports bad module name errors.

If a caller attempts to run too many tasks on the callback server at once when run standalone, the JobRunner._watch loop will exit and the callback server will hang forever. This duplicates (for now) the job counting logic in the callback server itself so it doesn't submit jobs guaranteed to fail to the _watch loop. Eventually all this should likely be refactored. Also catches and reports bad module name errors.

codecov · 2025-08-08T23:52:16Z

Codecov Report

❌ Patch coverage is 76.92308% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.55%. Comparing base (95f9e56) to head (13ca137).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
JobRunner/callback_server.py	70.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #108      +/-   ##
==========================================
+ Coverage   80.43%   80.55%   +0.11%     
==========================================
  Files          13       13              
  Lines        1140     1147       +7     
==========================================
+ Hits          917      924       +7     
  Misses        223      223

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MrCreosote · 2025-08-09T00:03:48Z

the uncovered stuff is again due to the callback server running in a different process

MrCreosote · 2025-08-09T00:07:09Z

codacy is complaining about asserts in tests again

Copilot

Pull Request Overview

This PR fixes a critical issue where the callback server would hang forever when too many jobs were submitted, causing the JobRunner._watch loop to exit. The fix introduces job counting logic in the callback server to prevent submitting jobs that are guaranteed to fail.

Key changes:

Adds configurable max_tasks parameter to prevent job submission when at capacity
Improves error handling for bad module names and method parsing
Enhances test coverage for job limits and error conditions

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
JobRunner/callback_server.py	Implements job counting logic and improved error handling to prevent server hangs
JobRunner/config.py	Adds max_tasks parameter to Config class constructor
JobRunner/Callback.py	Passes max_tasks parameter through Callback class
JobRunner/JobRunner.py	Updates callback server initialization to include max_tasks
test/test_callback_server_integration.py	Adds comprehensive tests for job limits and error conditions
test/test_callback_server.py	Updates test configuration to include new jobcount and maxjobs parameters
pyproject.toml	Adds ipython development dependency

Copilot · 2025-08-11T15:18:11Z

test/test_callback_server_integration.py

+    # the ip, which doesn't work if you're behind a NAT.
+    # All of this needs a rethink / refactor
+    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+    s.connect(("gmail.com", 80))


Using gmail.com as a target for socket connection is unreliable and creates an external dependency in tests. Consider using a more stable service or implementing a local network discovery method.

Suggested change

s.connect(("gmail.com", 80))

# Use a non-routable IP address to determine local IP without external dependency

s.connect(("10.255.255.255", 1))

If gmail.com isn't stable nothing is... also this is how it's done in other places in the codebase, it was just copied from there

Copilot · 2025-08-11T15:18:11Z

JobRunner/callback_server.py

    _check_rpc_token(app, token)
-    job_id = str(uuid.uuid1())
-    data["method"] = "%s.%s" % (module, method[1:-7])
+    job_id = str(uuid.uuid4())


The change from uuid.uuid1() to uuid.uuid4() should be consistent throughout the codebase. However, there's still a uuid.uuid1() call on line 93 in the test file that wasn't updated.

Line 93 of the test file is a switch from uuid1 to uuid4

Copilot · 2025-08-11T15:18:12Z

JobRunner/callback_server.py

                "error": exception_message,
                "code": "123",
                "message": exception_message,
+                "name": "CallbackServerError", 


Missing space before the comma - should be 'CallbackServerError', not 'CallbackServerError', (note the trailing space).

Suggested change

"name": "CallbackServerError",

"name": "CallbackServerError",

Fixed locally, it'll show up in a future PR

Copilot · 2025-08-11T15:18:12Z

test/test_callback_server_integration.py

+
+def _restore_env(envvar, old_val):
+    if old_val is None:
+        os.environ.pop(envvar)


Using os.environ.pop() without a default value will raise KeyError if the environment variable doesn't exist. This could cause test failures in certain environments.

Suggested change

os.environ.pop(envvar)

os.environ.pop(envvar, None)

In this case it's guaranteed the env var is present.

MrCreosote requested a review from bio-boris August 8, 2025 23:48

Fix trailing whitespace

62f2b11

More codacy whitespace complaints

13ca137

bio-boris requested a review from Copilot August 11, 2025 15:17

Copilot AI reviewed Aug 11, 2025

View reviewed changes

bio-boris approved these changes Aug 11, 2025

View reviewed changes

MrCreosote merged commit 31a1753 into main Aug 11, 2025
10 of 12 checks passed

bio-boris mentioned this pull request Aug 15, 2025

Update release notes #115

Merged

	s.connect(("gmail.com", 80))
	# Use a non-routable IP address to determine local IP without external dependency
	s.connect(("10.255.255.255", 1))

	"name": "CallbackServerError",
	"name": "CallbackServerError",

Fix callback server death on task overage #108

Fix callback server death on task overage #108

Uh oh!

Conversation

MrCreosote commented Aug 8, 2025

Uh oh!

codecov bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MrCreosote commented Aug 9, 2025

Uh oh!

MrCreosote commented Aug 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

MrCreosote Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

MrCreosote Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

MrCreosote Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

MrCreosote Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 8, 2025 •

edited

Loading