-
Notifications
You must be signed in to change notification settings - Fork 124
Description
Hello,
I have been running the following using a bed file that has around 379,000 intervals. I've been setting my own toil directories because I only have scratch space in a mounted directory "~/Desktop" which I run the following commands in. Therefore, /data is in ~/Desktop.
docker run --rm -it
--cpus=50
--memory=200g
-v "$PWD":/data
-w /data
-e TMPDIR=/data/tmp
-e PARALLEL="--jobs 50"
quay.io/comparative-genomics-toolkit/cactus:v3.0.1
mkdir -p /data/toil-work /data/toil-coord /data/tmp
cactus-hal2maf
/data/FormGenomes/Cactus/Halper/RERConverge/Liftover_Step4/jobstore
/data/FormGenomes/Cactus/363/363-avian-2020.hal
/data/FormGenomes/Cactus/Halper/RERConverge/Liftover_Step4/Merged.Consensus.maf.gz
--refGenome Gallus_gallus_wlh
--bedRanges /data/FormGenomes/Cactus/Halper/RERConverge/Liftover_Step4/Merged.Consensus.3col.bed
--noAncestors
--workDir /data/toil-work
--coordinationDir /data/toil-coord
--onlyOrthologs
--filterGapCausingDupes
--outType single
--chunkSize 500000
--maxMemory 200G
--defaultCores 50
--maxDisk 1100G
--defaultDisk 1100G
--cleanWorkDir onSuccess
I'm not sure why but I seem to be getting the same maf.gz files getting generated one after the other in new toil-work subdirectories, its been running continuously for a few days
-rw-r--r-- 1 nicolasalexandre nicolasalexandre 121K Feb 21 04:37 /home/nicolasalexandre/Desktop/toil-work/toilwf-e827e8a6c7955ce3beec31734a65b3d1/0cb4/job/tmptbkrbgbs/Merged.Consensus_chunk_2239.single.maf.gz
-rw-r--r-- 1 root root 121K Feb 23 22:58 /home/nicolasalexandre/Desktop/toil-work/toilwf-e827e8a6c7955ce3beec31734a65b3d1/5be4/job/tmpaam04jdt/Merged.Consensus_chunk_2239.single.maf.gz
The commands seem to be the same as well:
head ./5be4/job/tmpaam04jdt/taf_cmds.txt
set -eo pipefail && gzip -dc Merged.Consensus_chunk_0.maf.gz | taffy view | /usr/bin/time -vp taffy norm -a 363-avian-2020.hal -k -d 2> 0.tn.time | taffy view | taffy sort -n genome.list | taffy view -m | mafDuplicateFilter -m - -k | bgzip > Merged.Consensus_chunk_0.single.maf.gz
head ./0cb4/job/tmptbkrbgbs/taf_cmds.txt
set -eo pipefail && gzip -dc Merged.Consensus_chunk_0.maf.gz | taffy view | /usr/bin/time -vp taffy norm -a 363-avian-2020.hal -k -d 2> 0.tn.time | taffy view | taffy sort -n genome.list | taffy view -m | mafDuplicateFilter -m - -k | bgzip > Merged.Consensus_chunk_0.single.maf.gz
The job has been running continuously for 4 days and seems to just be repeating processes.