Skip to content

Conversation

@MrCreosote
Copy link
Member

Updates various lower level code for job canceling, as well as some minor cleanup.

  • condor client:

    • adds cancel method
    • updates stats calculations to allow for canceled jobs missing fields in the ClassAd
    • explicitly specifies graceful shutdown when canceling jobs
  • container runner & executor

    • Updates log streaming to use threads for both streams to prevent blocking the main loop and thus signal handling.
    • Shuts down the docker container on getting a sigterm and exits.
    • Adds a simple script for manually running the container runner
  • Adds an argument to job_state.get_job to allow getting the admin details for a job, but only if the user owns the job.

  • Updates the run_job script to exec the python code so signals propagate.

Updates various lower level code for job canceling, as well as some
minor cleanup.

* condor client:
  * adds cancel method
  * updates stats calculations to allow for canceled jobs missing fields
in the ClassAd
  * explicitly specifies graceful shutdown when canceling jobs

* container runner & executor
  * Updates log streaming to use threads for both streams to prevent
blocking the main loop and thus signal handling.
  * Shuts down the docker container on getting a sigterm and exits.
  * Adds a simple script for manually running the container runner
* Adds an argument to job_state.get_job to allow getting the admin
details for a job, but only if the user owns the job.
* Updates the run_job script to exec the python code so signals
propagate.
@MrCreosote MrCreosote requested a review from Tianhao-Gu January 6, 2026 20:07
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 10.81081% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.23%. Comparing base (1023f2d) to head (5473bc6).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...mtaskservice/externalexecution/container_runner.py 8.33% 22 Missing ⚠️
cdmtaskservice/condor/client.py 12.50% 7 Missing ⚠️
cdmtaskservice/app_state.py 0.00% 3 Missing ⚠️
cdmtaskservice/job_state.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #512      +/-   ##
==========================================
- Coverage   54.33%   54.23%   -0.10%     
==========================================
  Files          68       68              
  Lines        5632     5646      +14     
==========================================
+ Hits         3060     3062       +2     
- Misses       2572     2584      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +102 to +103
signal.signal(signal.SIGTERM, cleanup)
signal.signal(signal.SIGINT, cleanup)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to restore signal in the finally block?

Copy link
Member Author

@MrCreosote MrCreosote Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean remove the signal handler? If so, I don't think so, it's a no op on the container and the sys.exit() call is effectively what would happen if the signal wasn't caught, so after the container has exited it shouldn't change the behavior unless I'm missing something

@MrCreosote MrCreosote merged commit 0729c16 into main Jan 7, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants