dnaseq-nextflow

Under construction — not used in production.

Nextflow DSL2 pipeline for DNA sequencing analysis. Processes per-strain FASTQ files through alignment, variant calling, CNV, and coverage analysis (processSingleExperiment), then merges results across strains for downstream database loading (mergeExperiments).

Quick Start

# Per-strain analysis (default)
nextflow run main.nf -profile processSingleExperiment

# Multi-strain merge
nextflow run main.nf -entry mergeExperiments -profile mergeExperiments

# Tests
nextflow run main.nf -entry runTests -profile tests

Docker is enabled by default in all profiles.

processSingleExperiment

Runs per-strain: takes paired or single-end FASTQs from an nf-core samplesheet and produces a consensus FASTA, VCFs, indel TSV, coverage tracks, ploidy, and gene CNV estimates.

┌──────────────────────────────────────────────────────────┐
│                    QC & Preprocessing                    │
│                                                          │
│   FASTQ ──► FastQC ──► FastQC check ──► Trimmomatic      │
└────────────────────────────┬─────────────────────────────┘
                             │
┌────────────────────────────▼─────────────────────────────┐
│                        Alignment                         │
│                                                          │
│   BWA-MEM ──► Picard (dedup) ──► GATK (indel realign)    │
└──────────────────┬──────────────────────┬────────────────┘
                   │                      │
┌──────────────────▼──────┐  ┌────────────▼───────────────┐
│     Variant Calling     │  │       CNV / Coverage        │
│                         │  │                             │
│  FreeBayes              │  │  genomecov                  │
│  filterAndSplitVcf      │  │    └──► coverage bigwig     │
│    ├── SNPs VCF         │  │                             │
│    ├── Indels VCF       │  │  htseqCount ──► TPM         │
│    └── Consensus VCF    │  │    └──► ploidy & gene CNVs  │
│  makeIndelTSV           │  │                             │
│  makeCoverageBed        │  │  bedtoolsWindowed            │
│  consensus FASTA        │  │    └──► norm. cov. bigwig   │
│    (coverage-masked)    │  │                             │
└──────────────┬──────────┘  └─────────────────────────────┘
               │
┌──────────────▼──────────────────────────────────────────┐
│                    Density Tracks                        │
│                                                          │
│  SNP density bigwigs                                     │
│  Het SNP density bigwigs  (ploidy > 1 only)              │
└──────────────────────────────────────────────────────────┘

  Alignment stats (samtools + bedtools genomecov) ──► merged TSV

Outputs

File	Description
`*_consensus.fa.gz`	Per-strain consensus FASTA, low-coverage positions masked
`result.vcf.gz`	Full FreeBayes VCF (complex variants)
`indels.tsv`	Indel table (homozygous only; het indels excluded)
`coverage.bed.gz`	Per-position coverage BED
`*_Ploidy.txt`	Estimated ploidy per strain
`*_geneCNVs.txt`	Gene-level copy number estimates
`*.bw`	Coverage, normalised coverage, SNP density, het SNP density bigwigs
`alignment_stats.tsv`	Merged samtools + bedtools coverage stats across all samples

Parameters

Parameter	Description
`samplesheet`	nf-core CSV (`sample`, `fastq_1`, `fastq_2`)
`genomeFastaFile`	Reference genome FASTA
`gtfFile`	Gene annotation GTF
`footprintFile`	Gene footprints file for CNV
`geneSourceIdOrthologFile`	Gene source ID / ortholog mapping TSV
`chrsForCalcFile`	Chromosomes to include in ploidy calculation
`minCoverage`	Minimum depth for variant calling and consensus masking
`ploidy`	Expected ploidy (het SNP tracks skipped when `1`)
`winLen`	Window size (bp) for density and windowed coverage tracks
`bwaThreads`	Threads for BWA-MEM
`outputDir`	Output directory

mergeExperiments

Takes the per-strain outputs from one or more processSingleExperiment runs and merges them across strains. Annotates variants against SQLite coding-sequence and indel databases, then runs SnpEff for functional annotation. Outputs are intended for loading into a GUS/VEuPathDB database.

Steps

Combine indels — collect per-strain indels.tsv files and build a genomic indel SQLite database (makeGenomicIndelDb)
Merge VCFs — bcftools merge across all per-strain VCFs; single-strain inputs skip the merge step
Merge coverage — concatenate per-strain coverage BED files into a single TSV (mergeCoverageBeds)
Build coding data — derive coding-sequence and coding-indel SQLite databases from consensus FASTAs + GTF + reference genome (makeCodingData)
Annotate variants — processSeqVars runs bin/processSequenceVariations.jl against the SQLite DBs; produces annotated VCF and variation/allele/product DAT files
SnpEff — functional annotation of the merged VCF

Inputs (from processSingleExperiment)

Parameter	Description
`relativeConsensusFilePattern`	Glob for per-strain `*_consensus.fa.gz` files
`vcfFiles`	Glob for per-strain `result.vcf.gz` files
`indelsFiles`	Glob for per-strain `indels.tsv` files
`coverageFiles`	Glob for per-strain `*.coverage.bed.gz` files

Additional Parameters

Parameter	Description
`genomeFastaFile`	Reference genome FASTA
`gtfFile`	Gene annotation GTF
`vcfCacheFile`	VCF cache from a previous run (avoid re-annotating known variants)
`undoneStrains`	Strains to exclude from annotation
`reference_strain`	Reference strain name
`outputDir`	Output directory

Containers

Image	Used by
`veupathdb/shortreadaligner:1.0.0`	BWA, samtools, Picard, GATK3, FreeBayes, bcftools, bedtools, Julia 1.10, Perl/BioPerl, SnpEff
`veupathdb/dnaseqanalysis:1.0.0`	Trimmomatic, htseq-count

Testing

nextflow run main.nf -entry runTests -profile tests

Tests live in testing/t/ and use Perl's Test2::V0 framework, run via prove.

Name		Name	Last commit message	Last commit date
Latest commit History 320 Commits
bin		bin
data		data
docs		docs
lib/perl		lib/perl
modules		modules
testing		testing
workflows		workflows
.dockerignore		.dockerignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dnaseq-nextflow

Quick Start

processSingleExperiment

Outputs

Parameters

mergeExperiments

Steps

Inputs (from processSingleExperiment)

Additional Parameters

Containers

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dnaseq-nextflow

Quick Start

processSingleExperiment

Outputs

Parameters

mergeExperiments

Steps

Inputs (from processSingleExperiment)

Additional Parameters

Containers

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages