Skip to content

Conversation

@sainsachiko
Copy link
Contributor

Fix file extension to:

  • Get right extension for .gz file
  • As the tool can take multiple input types and write to 1 single output, modify code to suit with tool's convention:
image

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the PBMARKDUP module’s handling of input file extensions so that outputs follow the tool’s convention when given multiple input types, including compressed FASTA/FASTQ, and updates the associated tests and snapshots accordingly.

Changes:

  • Update PBMARKDUP process logic to derive the output suffix from the actual input filenames, including correctly preserving compound extensions such as .fastq.gz and .fasta.gz.
  • Adjust nf-test specifications for pbmarkdup to cover multiple input patterns (single FASTA, multiple BAMs, and mixed FASTA/FASTQ inputs with dupfile+log options).
  • Regenerate the nf-test snapshot file with new test names, expected outputs (including dupfile and log channels), and newer nf-test and Nextflow version metadata.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
modules/nf-core/pbmarkdup/main.nf Changes suffix calculation to search the input list for FASTA/FASTQ filenames (including .gz), derive the appropriate extension fragment, and fall back to the first input’s extension, ensuring the module’s single-output filename matches the primary input type.
modules/nf-core/pbmarkdup/tests/main.nf.test Renames and restructures tests to exercise multiple input types and the dupfile/log behaviour, including a new mixed FASTA/FASTQ test, while keeping stub and single-input FASTA coverage.
modules/nf-core/pbmarkdup/tests/main.nf.test.snap Updates snapshot keys and expected outputs to align with the new test names and behaviours (including null.dup.bam dupfile outputs and test.pbmarkdup.log logs), and records the newer nf-test/Nextflow versions and timestamps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +78 to 79
pbmarkdup_args = "--clobber --dup-file ${prefix}.dup.bam --log-level INFO"
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here pbmarkdup_args uses ${prefix} inside a Groovy double-quoted string, but prefix is not defined in this nf-test parameter scope, so it interpolates to the literal string null (reflected by the null.dup.bam filenames in the snapshot). To avoid this surprising behaviour and better reflect the intention of tying the duplicate file name to the module’s output prefix, consider either using an explicit fixed dup file name here or passing the desired prefix via task.ext.prefix instead of relying on ${prefix} in the params string.

Copilot uses AI. Check for mistakes.
}

test("acropora cervicornis - bam - multiple tests remove duplicates") {
test("homo sapiens - Multiple input types - with dupfile logfile and remove duplicates") {
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test description "homo sapiens - Multiple input types" doesn’t match the actual test data, which still uses genomics/eukaryotes/acropora_cervicornis/... paths, so the species label in the name is misleading. To keep tests self-describing and easier to interpret, consider either renaming the test to reference Acropora cervicornis or switching the input files to a Homo sapiens dataset.

Suggested change
test("homo sapiens - Multiple input types - with dupfile logfile and remove duplicates") {
test("acropora cervicornis - Multiple input types - with dupfile logfile and remove duplicates") {

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +34
// To allow multiple input types/files: (compressed) fasta, fastq, bam; Determine suffix from input file names
suffix =
input.find {
it.name ==~ /.*\.(fasta|fa|fna)(\.gz)?$/ }?.with { f ->
f.name.tokenize('.').takeRight(f.name.endsWith('.gz') ? 2 : 1).join('.')
} ?:
input.find { it.name ==~ /.*\.(fastq|fq)(\.gz)?$/ }?.with { f ->
f.name.tokenize('.').takeRight(f.name.endsWith('.gz') ? 2 : 1).join('.')
} ?:
input[0].extension
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new suffix resolution logic adds explicit handling for compressed FASTA/FASTQ inputs (e.g. .fasta.gz, .fastq.gz), but the updated pbmarkdup tests only cover uncompressed fasta/fastq and BAM inputs, so the .gz branch isn’t exercised. Given this module already has nf-test coverage, it would be worthwhile to add at least one test using a .fasta.gz or .fastq.gz input to verify that the output file name preserves the full compressed extension as intended.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant