Fix file extension for module PBMARKDUP #9784

sainsachiko · 2026-01-27T11:23:49Z

Fix file extension to:

Get right extension for .gz file
As the tool can take multiple input types and write to 1 single output, modify code to suit with tool's convention:

PR checklist

Closes #XXX

Copilot

Pull request overview

This PR refines the PBMARKDUP module’s handling of input file extensions so that outputs follow the tool’s convention when given multiple input types, including compressed FASTA/FASTQ, and updates the associated tests and snapshots accordingly.

Changes:

Update PBMARKDUP process logic to derive the output suffix from the actual input filenames, including correctly preserving compound extensions such as .fastq.gz and .fasta.gz.
Adjust nf-test specifications for pbmarkdup to cover multiple input patterns (single FASTA, multiple BAMs, and mixed FASTA/FASTQ inputs with dupfile+log options).
Regenerate the nf-test snapshot file with new test names, expected outputs (including dupfile and log channels), and newer nf-test and Nextflow version metadata.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`modules/nf-core/pbmarkdup/main.nf`	Changes suffix calculation to search the input list for FASTA/FASTQ filenames (including `.gz`), derive the appropriate extension fragment, and fall back to the first input’s extension, ensuring the module’s single-output filename matches the primary input type.
`modules/nf-core/pbmarkdup/tests/main.nf.test`	Renames and restructures tests to exercise multiple input types and the dupfile/log behaviour, including a new mixed FASTA/FASTQ test, while keeping stub and single-input FASTA coverage.
`modules/nf-core/pbmarkdup/tests/main.nf.test.snap`	Updates snapshot keys and expected outputs to align with the new test names and behaviours (including `null.dup.bam` dupfile outputs and `test.pbmarkdup.log` logs), and records the newer nf-test/Nextflow versions and timestamps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T11:40:50Z

modules/nf-core/pbmarkdup/tests/main.nf.test

+                pbmarkdup_args = "--clobber --dup-file ${prefix}.dup.bam --log-level INFO"
            }


Here pbmarkdup_args uses ${prefix} inside a Groovy double-quoted string, but prefix is not defined in this nf-test parameter scope, so it interpolates to the literal string null (reflected by the null.dup.bam filenames in the snapshot). To avoid this surprising behaviour and better reflect the intention of tying the duplicate file name to the module’s output prefix, consider either using an explicit fixed dup file name here or passing the desired prefix via task.ext.prefix instead of relying on ${prefix} in the params string.

Copilot · 2026-01-27T11:40:51Z

modules/nf-core/pbmarkdup/tests/main.nf.test

    }

-    test("acropora cervicornis - bam - multiple tests remove duplicates") {
+    test("homo sapiens - Multiple input types - with dupfile logfile and remove duplicates") {


The test description "homo sapiens - Multiple input types" doesn’t match the actual test data, which still uses genomics/eukaryotes/acropora_cervicornis/... paths, so the species label in the name is misleading. To keep tests self-describing and easier to interpret, consider either renaming the test to reference Acropora cervicornis or switching the input files to a Homo sapiens dataset.

Suggested change

test("homo sapiens - Multiple input types - with dupfile logfile and remove duplicates") {

test("acropora cervicornis - Multiple input types - with dupfile logfile and remove duplicates") {

Copilot · 2026-01-27T11:40:51Z

modules/nf-core/pbmarkdup/main.nf

+    // To allow multiple input types/files: (compressed) fasta, fastq, bam; Determine suffix from input file names
+    suffix        =                                     
+        input.find { 
+            it.name ==~ /.*\.(fasta|fa|fna)(\.gz)?$/ }?.with { f ->
+            f.name.tokenize('.').takeRight(f.name.endsWith('.gz') ? 2 : 1).join('.')
+        } ?:
+        input.find { it.name ==~ /.*\.(fastq|fq)(\.gz)?$/ }?.with { f ->
+            f.name.tokenize('.').takeRight(f.name.endsWith('.gz') ? 2 : 1).join('.')
+        } ?:
+        input[0].extension


The new suffix resolution logic adds explicit handling for compressed FASTA/FASTQ inputs (e.g. .fasta.gz, .fastq.gz), but the updated pbmarkdup tests only cover uncompressed fasta/fastq and BAM inputs, so the .gz branch isn’t exercised. Given this module already has nf-test coverage, it would be worthwhile to add at least one test using a .fasta.gz or .fastq.gz input to verify that the output file name preserves the full compressed extension as intended.

Fix file extension

644d805

sainsachiko requested a review from Copilot January 27, 2026 11:23

Copilot started reviewing on behalf of sainsachiko January 27, 2026 11:24 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix file extension for module PBMARKDUP #9784

Fix file extension for module PBMARKDUP #9784

sainsachiko commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		pbmarkdup_args = "--clobber --dup-file ${prefix}.dup.bam --log-level INFO"
		}

	test("homo sapiens - Multiple input types - with dupfile logfile and remove duplicates") {
	test("acropora cervicornis - Multiple input types - with dupfile logfile and remove duplicates") {

Fix file extension for module PBMARKDUP #9784

Are you sure you want to change the base?

Fix file extension for module PBMARKDUP #9784

Conversation

sainsachiko commented Jan 27, 2026

PR checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant