-
Notifications
You must be signed in to change notification settings - Fork 47
Description
I'm using dsub on the All Of Us researcher workbench, and generally it has been great! I'm running a number of analyses at the same time on preemptible instances, which cuts down on cloud compute costs. Most of my jobs are submitted using a tasks file to run one job per chromosome (23 jobs in total). Because the instances are preemptible, some of them fail with the "worker was terminated" error. Many of the jobs take hours to run, so using dsub --wait is not practical. I'm looking for an easy way to resubmit these pre-empted jobs.
After reading through the documentation, I saw that the --skip parameter is an option, but my current tasks file has the output from each task going to the same directory with --recursive-output. Because some of them have succeeded and therefore have output existing in the --recursive-output directory, including the --skip argument when calling the dsub command again skips all tasks. I will probably change this for future jobs so I can use the --skip argument.
It also looks like most of the parameters for a job are saved in the output of dstat --full. Is there a way to use the output from dstat --full to resubmit a job without manually reconstructing the dsub command myself from that output? I'm thinking something like dsub --retry <job_id>, where it retries all failed jobs with that job id?
Apologies if this already exists and I missed it in the documentation!