feat: accept gzip-compressed fasta input by tpall · Pull Request #505 · WrightonLabCSU/DRAM

tpall · 2026-05-08T18:54:05Z

Summary

Accept gzip-compressed fasta inputs (*.fa.gz / *.fna.gz / *.fasta.gz) without requiring users to decompress first. Plain fastas keep working unchanged.

This was one of the changes bundled into the now-closed #472, split out per the maintainer-friendly path agreed when refiling #503 / #504.

What changed

f397b4a3 feat: accept gzip-compressed fasta input

+38 / -6 lines, three files:

modules/local/rename/decompress_fasta.nf (new, 20 lines) — wraps reformat.sh from the existing bbmap container (no new dependencies). Tagged process_tiny.
workflows/dram.nf — channel branch on .gz suffix, decompress only the gz branch, mix both back. Sample-name stripping is unified so sample.fa and sample.fa.gz yield identical downstream names.
nextflow_schema.json — input_fasta and fasta_fmt descriptions updated to mention gz support.

How it works

ch_fasta_named = ch_fasta_raw.map { f ->
    def name = f.name.replaceAll(/\.gz$/, '').replaceAll(/\.(fa|fna|fasta)$/, '')
    tuple(name, f)
}

ch_fasta_branched = ch_fasta_named.branch { entry ->
    gz:    entry[1].name.endsWith('.gz')
    plain: true
}

DECOMPRESS_FASTA( ch_fasta_branched.gz )
ch_fasta = DECOMPRESS_FASTA.out.decompressed_fasta.mix( ch_fasta_branched.plain )

The default --fasta_fmt '*.f*' already matches both plain and .gz files, so users with a mixed directory don't need to change their launch.

Test plan

nextflow inspect parses cleanly.
JSON schema valid.
HPC run with a directory containing both *.fa and *.fa.gz: confirm both end up annotated identically and DECOMPRESS_FASTA only fires on the gz branch.

🤖 Generated with Claude Code

Adds a small DECOMPRESS_FASTA module (`reformat.sh` from the bbmap container that other modules already use) and routes only `.gz` inputs through it via a channel branch on the `.gz` suffix. Plain fastas pass through unchanged. Sample-name normalisation strips both the trailing `.gz` (if present) and one of `.fa`/`.fna`/`.fasta` so `sample.fa` and `sample.fa.gz` yield the same downstream name. Outputs are identical regardless of input compression. Default `--fasta_fmt '*.f*'` already matches both plain and `.gz` files; schema description updated to mention this explicitly. Files: modules/local/rename/decompress_fasta.nf (new, 20 lines) workflows/dram.nf (channel branch + mix) nextflow_schema.json (description updates)

github-project-automation Bot added this to DRAM May 8, 2026

github-project-automation Bot moved this to To Sort in DRAM May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: accept gzip-compressed fasta input#505

feat: accept gzip-compressed fasta input#505
tpall wants to merge 1 commit intoWrightonLabCSU:devfrom
tpall:gzip-fasta-input

tpall commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tpall commented May 8, 2026

Summary

What changed

How it works

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant