Various scripts for ktp27 survival on CSD3. A clone exists at /rfs/project/rfs-iCNyzSAaucw/ktp27/csd3-scripts.
pyranger.pygeneratessbatchable wrappers for running Cellranger on various CSCI data.getfastq.shis used to pull in the requisite fastqs,stashimage.shcopies over provided Visium HD images intosample_imageson RFS,makelinks.shcreates links to mapping folder fromlibrarieson RFS.pysubis a wrapper for preparing things forsbatch, it's detailed in HPC intro. Usespostjob_sacct.shto generate reports with info.
getfastq.shtakes a library ID and retrieves all existing fastqs for it, copying them one pool-flowcell combination at a time and usingcrukci_to_illumina.pyto rename them to an Illumina-compatible nomenclature (as needed by e.g. Cellranger). This is done one pool-flowcell at a time because the CRUK renaming script is built for that, theScounter is incremented between each one (starting at 2, to leave 1 for the staging files from the current renaming).regenerate_library_fastqs.shtakes a library ID and checks on all its CRAMs, submittingcramfastq.shjobs to turn them back to fastqs if absent.remove_regenerated_library_fastqs.shtakes a library ID and checks on all its CRAMs, removing any fastqs regenerated viaregenerate_library_fastqs.sh. These are differentiated from primary CRUK fastqs (which are left intact) by the presence of an.fqsuccessfile left behind bycramfastq.sh.cramfastq.shtakes a CRAM file name (assumed to be in the same folder) and converts it back to fastqs, making an informed decision on I1/I2 generation based on the formatting of theBC:tag. Upon success, leaves behind a.fqsuccessfile.crukci_to_illumina.pyis a local (slightly outdated) copy of the official CRUK-to-Illumina renaming script, which renames all.fq.gzfiles present within.from CRUK to Illumina nomenclature.
add_publication_info.shtakes a library ID and the name of a file with publication information, and appends its contents topublications.tsvin the RFS library folder. The publication information is to be a four-column TSV containing an identifying author, manuscript title, repository where the data is deposited (e.g. ArrayExpress/EGA, not the actual accession), and the manuscript/preprint DOI if possible (if not, state "in progress")makelinks.shtakes a library ID and the name of the subfolder in the library structure to link the current folder to, and then creates a symlink to.within the specified subfolder for the library.rmlink.shtakes a library ID and the name of the subfolder in the library structure and undoes the action ofmakelinks.sh, removing the existing symlink to.within the specified subfolder of the library.stashimage.shtakes a library ID and the path to an image, and rsyncs it to the library's subfolder withinsample_imageson RFS if not already present. The RFS path has spaces replaced with underscores. Yields the RFS path as output, for easy use withinpyranger.
pull_rcs.shaccepts the full path to a file on RCS, and submits an rsync of it being copied to.as anohup &process that will persist in the background after logging out of the head node.tarball.shaccepts the path to a subdirectory within a user folder in RDSsharedData, and mirrors it in a.tar.gzstate tosharedData_CSCI_tarballs. Best ran within a screen due to possible long run times.
postjob_sacct.shis used bypysubto append usage stats to the stdout of a job. Accepts the job ID on input.slurm_partition_scan.shis a script written by Theo Nelson to monitor load across the various CSD3 partitions, as detailed within HPC intro.