-
Notifications
You must be signed in to change notification settings - Fork 1
Misc Run Setup: Add freecycle, adjust default errorStrategy, and fix resume/cleanup #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…o finish instead of terminate
… run script to address problems with incorrect cleanup after external cancellation. also moved test queue settings to compute.config
|
A couple specific questions for discussion: @dtm2451 @amadeovezz @AlaaALatif - what are your thoughts on having the default partition be: |
|
single_cell_RNAseq/README.md
Outdated
| Of note, it can be useful to adjust these values sensibly before / for all runs by adjusting this file. This can be especially useful with tissue data where it is often useful to compare across libraries and batches, with consistent cutoff values, as a method of assessing relative quality across the entire project. | ||
|
|
||
| ### c4 partitions, additional configuration | ||
| We have the following partitions available on the c4 cluster for DSCoLab members: krummellab,common (default partitions, which are set in the environment variable `$SBATCH_PARTITION`) and freecycle. Freecycle allows us to run jobs on any compute that is not in use (24h job max), with the caveat that the job may be cancelled if a partition owner requires these resources. We have set the default for `test` jobs to freecycle, and for standard jobs to `freecycle,krummellab,common`, which means they are first submitted to freecycle, then krummellab, then common. You may want to adjust this and other cluster options, such as the errorStrategy (what happens to the whole pipeline when it errors). See comments in `nextflow.config` describing what we have set as defaults, and what you may want to adjust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After "when it errors)." and before "See comments in nextflow.config", perhaps add:
If you do not have access to the krummellab partition on c4, you will need to change the process.queue parameter in your nextflow.config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankfully, it just defaults to whichever ones you have access to. So if I submit to costellolab,freecycle, it just goes to freecycle. Similarly if the resources are too high for freecycle (e.g. cellranger requests > 24h), it goes to common,krummellab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the note to the README too, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: I wonder what nextflow does if a job gets cancelled by the scheduler? We might want to build some sort of catch and retry in if possible... But since we've rarely, if ever, seen freecycle jobs actually get cancelled from primary lab prioritization, probably fine to not build this until needed
I think we should definitely do this! There is a way to set it to retry on specific job codes, e.g.:
errorStrategy { task.exitStatus == 140 ? 'retry' : 'terminate' }
Let me see if I can figure out how to figure out what the job code is when jobs are cancelled by freecycle, and then I can add this... Let me know if you have examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankfully, it just defaults to whichever ones you have access to. So if I submit to costellolab,freecycle, it just goes to freecycle. Similarly if the resources are too high for freecycle (e.g. cellranger requests > 24h), it goes to common,krummellab
Oh nice! Then... maybe the note I had you add isn't needed. Or at least as is. Cuz it works out fine if they don't change it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default number of jobs as 60 seems pretty high to me? 60 cellranger jobs would take A LOT of resources.
Noted - lowering to 20 for single cell pipeline. I wish there were a way to set a max number of cpus grrr because this is more what we want. I'll ask on the nextflow slack.
We should all discuss @AlaaALatif (this is what is set for bulk), and also bring up at a group meeting what folks want for this or other defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice! Then... maybe the note I had you add isn't needed. Or at least as is. Cuz it works out fine if they don't change it!
I think it's good to note that they might want to change it, happy to keep this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good as you rephrased, other than likely typo in "you will may to change"
Added suggestions from @dtm2451
Now contains the "[ ]" lists introduced by Amadeo's update Also mentions the data types must be one of "CITE" or "GEX" (not both) and optionally any other modality, and provides examples
|
@amadeovezz when you're reviewing, can you also double check the structure described in the updated |
Yep they match! Minor note: |
|
Helpful feedback from nextflow slack on how to set max.cpus, max.memory w/ SLURM:
nextflow-io/nextflow#640 |
|
Lingering to-dos:
|
…ocessing_pipelines into ef/partition_config
|
Weird error that I have now patched -- @dtm2451 and I observed that cellranger is much slower (days instead of hours) running from the nextflow pipeline. It appears to be the difference between writing to I have now changed this, but tagging @amadeovezz @AlaaALatif -- if you have slow steps they should write to We probably want to switch to doing this for all steps that take more than a few mins? I think that this nextflow flag should do it - but I need to look into more: https://www.nextflow.io/docs/latest/process.html#scratch |
This PR addresses several feature requests and modifications in the run script and configuration for the single cell pipeline.
Specifically:
workflow.onCompletedirective from nextflow./c4/scratch/by switching to/scratch/(now hours instead of days), but future work should implement this more cleanly