Skip to content

Toil-work directory rerunning the same jobs iteratively #1902

@nicolasmalexandre

Description

@nicolasmalexandre

Hello,

I have been running the following using a bed file that has around 379,000 intervals. I've been setting my own toil directories because I only have scratch space in a mounted directory "~/Desktop" which I run the following commands in. Therefore, /data is in ~/Desktop.

docker run --rm -it
--cpus=50
--memory=200g
-v "$PWD":/data
-w /data
-e TMPDIR=/data/tmp
-e PARALLEL="--jobs 50"
quay.io/comparative-genomics-toolkit/cactus:v3.0.1

mkdir -p /data/toil-work /data/toil-coord /data/tmp

cactus-hal2maf
/data/FormGenomes/Cactus/Halper/RERConverge/Liftover_Step4/jobstore
/data/FormGenomes/Cactus/363/363-avian-2020.hal
/data/FormGenomes/Cactus/Halper/RERConverge/Liftover_Step4/Merged.Consensus.maf.gz
--refGenome Gallus_gallus_wlh
--bedRanges /data/FormGenomes/Cactus/Halper/RERConverge/Liftover_Step4/Merged.Consensus.3col.bed
--noAncestors
--workDir /data/toil-work
--coordinationDir /data/toil-coord
--onlyOrthologs
--filterGapCausingDupes
--outType single
--chunkSize 500000
--maxMemory 200G
--defaultCores 50
--maxDisk 1100G
--defaultDisk 1100G
--cleanWorkDir onSuccess

I'm not sure why but I seem to be getting the same maf.gz files getting generated one after the other in new toil-work subdirectories, its been running continuously for a few days

-rw-r--r-- 1 nicolasalexandre nicolasalexandre 121K Feb 21 04:37 /home/nicolasalexandre/Desktop/toil-work/toilwf-e827e8a6c7955ce3beec31734a65b3d1/0cb4/job/tmptbkrbgbs/Merged.Consensus_chunk_2239.single.maf.gz

-rw-r--r-- 1 root root 121K Feb 23 22:58 /home/nicolasalexandre/Desktop/toil-work/toilwf-e827e8a6c7955ce3beec31734a65b3d1/5be4/job/tmpaam04jdt/Merged.Consensus_chunk_2239.single.maf.gz

The commands seem to be the same as well:

head ./5be4/job/tmpaam04jdt/taf_cmds.txt
set -eo pipefail && gzip -dc Merged.Consensus_chunk_0.maf.gz | taffy view | /usr/bin/time -vp taffy norm -a 363-avian-2020.hal -k -d 2> 0.tn.time | taffy view | taffy sort -n genome.list | taffy view -m | mafDuplicateFilter -m - -k | bgzip > Merged.Consensus_chunk_0.single.maf.gz

head ./0cb4/job/tmptbkrbgbs/taf_cmds.txt
set -eo pipefail && gzip -dc Merged.Consensus_chunk_0.maf.gz | taffy view | /usr/bin/time -vp taffy norm -a 363-avian-2020.hal -k -d 2> 0.tn.time | taffy view | taffy sort -n genome.list | taffy view -m | mafDuplicateFilter -m - -k | bgzip > Merged.Consensus_chunk_0.single.maf.gz

The job has been running continuously for 4 days and seems to just be repeating processes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions