Skip to content

Feature request: re-submit jobs using information stored in dstat --full #308

@amstilp

Description

@amstilp

I'm using dsub on the All Of Us researcher workbench, and generally it has been great! I'm running a number of analyses at the same time on preemptible instances, which cuts down on cloud compute costs. Most of my jobs are submitted using a tasks file to run one job per chromosome (23 jobs in total). Because the instances are preemptible, some of them fail with the "worker was terminated" error. Many of the jobs take hours to run, so using dsub --wait is not practical. I'm looking for an easy way to resubmit these pre-empted jobs.

After reading through the documentation, I saw that the --skip parameter is an option, but my current tasks file has the output from each task going to the same directory with --recursive-output. Because some of them have succeeded and therefore have output existing in the --recursive-output directory, including the --skip argument when calling the dsub command again skips all tasks. I will probably change this for future jobs so I can use the --skip argument.

It also looks like most of the parameters for a job are saved in the output of dstat --full. Is there a way to use the output from dstat --full to resubmit a job without manually reconstructing the dsub command myself from that output? I'm thinking something like dsub --retry <job_id>, where it retries all failed jobs with that job id?

Apologies if this already exists and I missed it in the documentation!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions