-
Notifications
You must be signed in to change notification settings - Fork 5
Fix callback server dying when an error occurs in JobRunner._submit #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, when running in callback server mode, if the JobRunner._submit mode throws an error, the _watch loop will exit and the callback server will hang forever as the queues are no longer being serviced. This handles any _submit errors by putting an error back on the queue instead when running in callback server mode.
|
codacy is complaining about assert usage in tests |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #110 +/- ##
==========================================
+ Coverage 80.55% 80.95% +0.39%
==========================================
Files 13 13
Lines 1147 1155 +8
==========================================
+ Hits 924 935 +11
+ Misses 223 220 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Codacy is complaining about assert use in tests |
| """ | ||
|
|
||
| def __init__(self, config: Config, port=None): | ||
| def __init__(self, config: Config, port=None, server_mode: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add info about what "Server Mode" means in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a critical issue where the callback server would hang forever when an error occurred in the JobRunner._submit method. The fix ensures the callback server remains functional by handling submit errors gracefully when running in server mode.
Key changes:
- Added error handling around the _submit method call to catch exceptions in callback server mode
- Modified JobRunner constructor to accept a server_mode parameter to distinguish callback server operation
- Added comprehensive test coverage for the error handling scenario
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| JobRunner/JobRunner.py | Added server_mode parameter and try-catch block around _submit calls to handle errors gracefully |
| JobRunner/Callback.py | Updated to pass server_mode=True when creating JobRunner instances |
| test/test_callback_server_integration.py | Added new test for submit failures and updated existing test parameters |
JobRunner/JobRunner.py
Outdated
| inputs: config dictionary, EE2 URL, Job id, Token, Admin Token | ||
| Create the job runner. | ||
| cnnfig - the job runner config. |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the parameter documentation. 'cnnfig' should be 'config'.
| cnnfig - the job runner config. | |
| config - the job runner config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| j = resp.json() | ||
| assert "JobRunner/JobRunner/DockerRunner.py" in j["error"]["error"] | ||
| del j["error"]["error"] | ||
| assert j == { |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra space between '==' and '{' in the assertion.
| assert j == { | |
| assert j == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Currently, when running in callback server mode, if the JobRunner._submit method throws an error, the _watch loop will exit and the callback server will hang forever as the queues are no longer being serviced. This handles any _submit errors by putting an error back on the queue instead when running in callback server mode.