Skip to content

Add module: dorado/basecaller#11122

Open
sahuno wants to merge 9 commits intonf-core:masterfrom
sahuno:add-dorado-basecaller
Open

Add module: dorado/basecaller#11122
sahuno wants to merge 9 commits intonf-core:masterfrom
sahuno:add-dorado-basecaller

Conversation

@sahuno
Copy link
Copy Markdown

@sahuno sahuno commented Apr 6, 2026

Summary

  • Adds DORADO_BASECALLER process for Oxford Nanopore basecalling with dorado
  • Supports automatic model selection, modified base calling (5mCG_5hmCG, 5mC, m6A), and optional integrated alignment
  • GPU-accelerated; follows the same container pattern as nf-core/parabricks modules

Key design decisions

Container — no bioconda/BioContainers (ONTPL licence)

dorado is not on bioconda due to its Oxford Nanopore commercial licence. The module uses the official nanoporetech/dorado Docker Hub image directly (SHA-pinned to v1.4.0), the same pattern used by parabricks modules (nvcr.io/nvidia/...). conda null is set with a comment. ONT does not publish semver Docker tags yet — tracked in nanoporetech/dorado#1584; the SHA will be replaced with a version tag when available.

GPU label — process_gpu

The module requires a CUDA GPU and uses label 'process_gpu'. This label is defined in nf-core/configs but is flagged as non-standard by lint. Happy to discuss the right approach on the PR.

CI tests — stub only (GPU not available on GitHub Actions)

  • 2 stub tests run in CI: unaligned BAM and aligned BAM (reference input path)
  • 2 real GPU tests tagged gpu are excluded from CI but pass locally on a SLURM GPU node
  • Precedent: parabricks modules also skip GPU CI tests

Test data

Lint results

  • 48 tests passed, 0 failures, 4 warnings (all explained above: quay.io 403, container version mismatch, process_gpu label)

PR checklist

  • This comment contains a description of changes (with reason)
  • Added tests
  • Followed module conventions (contribution docs)
  • Test data included
  • No TODO statements
  • Version broadcast via topic: versions using eval() pattern
  • Naming conventions followed (DORADO_BASECALLER)
  • Resource label present (process_gpu)
  • conda null — bioconda not possible (ONTPL licence), documented
  • nf-core modules test dorado/basecaller --profile docker — pending (requires Docker daemon + GPU)
  • nf-core modules test dorado/basecaller --profile singularity — stub tests pass locally; real GPU tests require compute node

🤖 Generated with Claude Code

sahuno and others added 8 commits April 5, 2026 21:16
Oxford Nanopore dorado basecaller module with automatic model selection,
modified base calling (5mCG_5hmCG via combined model syntax), and optional
alignment to a reference genome. Uses local SIF for Track 1 (lab use);
Track 2 upstream blocked pending dorado bioconda package.

Tests: 2 stub (CPU) + 2 real GPU tests (NVIDIA L40S, componc_gpu_batch).
Uses --profile singularity,gpu to expose GPU via --nv flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a stub test using the public HG002 GIAB pod5 from nf-core/test-datasets
(PR nf-core/test-datasets#1968). Test references the file via
modules_testdata_base_path so it resolves once that PR merges.

Also commits the 2.2 MB 10-read pod5 subset locally for development.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thin wrapper over nanoporetech/dorado:sha... adding version labels.
Used by `wave freeze --dockerfile Dockerfile` to obtain a stable
community.wave.seqera.io URI for nf-core upstream submission.
See sandbox/TRACK2_wave_freeze.md for the full runbook.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nf-core/parabricks modules use nvcr.io/nvidia/... directly without bioconda.
Following same pattern: use nanoporetech/dorado:sha... from Docker Hub.
Removes the singularity/docker ternary — single container field only.
Local SIF override kept in tests/nextflow.config for MSKCC HPC testing.
Tracking semantic version tags: nanoporetech/dorado#1584.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Container: sahuno/dorado:1.4.0 (wraps nanoporetech/dorado v1.4.0 + samtools)
- meta.yml: document that dorado outputs SO:unknown; advise SAMTOOLS_SORT +
  SAMTOOLS_INDEX downstream (confirmed with GIAB HG002 10-read test on A100)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Correct model input YAML structure (remove extra nesting level)
- Add topics: section for version broadcast
- Fix licence list format
- Add edam ontology for summary TSV output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from sahuno/dorado:1.4.0 wrapper to the upstream
nanoporetech/dorado:shac8f356489fa8b44b31beba841b84d2879de2088e (v1.4.0).
Same approach as parabricks modules using nvcr.io/nvidia directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/nextflow.config: remove hardcoded local SIF container override,
  local models_dir, and /data1/greenbab reference paths
- tests/main.nf.test: stub test 2 uses [[], [], []] for reference
  (stub block doesn't use reference; real aligned test is GPU-tagged);
  GPU test with reference uses params.modules_testdata_base_path
- Remove GIAB Track 2 stub test (pending nf-core/test-datasets#1968)
- Update snapshots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sahuno sahuno requested review from a team, edmundmiller and maxulysse as code owners April 6, 2026 02:09
ext.device is not an allowed ext key in nf-core modules (only ext.args,
ext.prefix, ext.when are permitted). Hardcode --device cuda:all directly
in the script block. Users needing a different device string can pass it
via ext.args (e.g. --device cpu for testing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your contribution.

In general this looks fine to me.
An improvement would be requesting to upload the container to the nfcore org.

I have some comments.
And also, you have to remove the test data.

Comment on lines +51 to +55

cat <<-END_VERSIONS > versions.yml
"${task.process}":
dorado: 1.4.0
END_VERSIONS
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cat <<-END_VERSIONS > versions.yml
"${task.process}":
dorado: 1.4.0
END_VERSIONS

output:
tuple val(meta), path("*.bam") , emit: bam
tuple val(meta), path("*_summary.tsv"), emit: summary , optional: true
tuple val(meta), path("*.log") , emit: log , optional: true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dorado basecaller doesn't have any option to write the log in a file, so it's not possible to get this output.

Comment on lines +6 to +7
// Docker Hub image directly — same pattern as nf-core/parabricks modules
// (nvcr.io/nvidia/...). SHA tag pins to v1.4.0; a semver tag is tracked in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Docker Hub image directly — same pattern as nf-core/parabricks modules
// (nvcr.io/nvidia/...). SHA tag pins to v1.4.0; a semver tag is tracked in
// Docker Hub image directly. SHA tag pins to v1.4.0; a semver tag is tracked in

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is not used. You're not loading it in main.nf.test.
So, if it's not necessary, then I would remove it.

// (nvcr.io/nvidia/...). SHA tag pins to v1.4.0; a semver tag is tracked in
// nanoporetech/dorado#1584.
conda null
container "nanoporetech/dorado:shac8f356489fa8b44b31beba841b84d2879de2088e"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container "nanoporetech/dorado:shac8f356489fa8b44b31beba841b84d2879de2088e"
container "docker.io/nanoporetech/dorado:shac8f356489fa8b44b31beba841b84d2879de2088e"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't modify this. It shouldn't be necessary.

@sahuno
Copy link
Copy Markdown
Author

sahuno commented Apr 6, 2026

Thanks for the review @dialvarezs! All points addressed in 8779494:

  • Container: switched from sahuno/dorado:1.4.0 to docker.io/nanoporetech/dorado:shac8f356489fa8b44b31beba841b84d2879de2088e (official upstream image with docker.io/ prefix)
  • Comment wording: updated as suggested
  • *.log output: removed from main.nf, meta.yml, and stub — confirmed dorado basecaller has no --log-file flag
  • Test data: removed both pod5 files from tests/data/ — the GIAB HG002 pod5 is now hosted in nf-core/test-datasets (merged in Added fq/lint module #1968) and referenced via params.modules_testdata_base_path
  • tests/nextflow.config: removed — it was not loaded by main.nf.test and contained local paths
  • .gitignore: no changes made to this file

Regarding the container upload to the nfcore org — happy to pursue that. Should I open a request with ONT to mirror nanoporetech/dorado under docker.io/nfcore/dorado, or is there an existing process for this in nf-core?

Lint: 51 passed, 0 failures (4 expected warnings for Docker Hub / SHA tag / process_gpu label).
Tests: 4/4 stub tests pass.

@sahuno sahuno requested a review from dialvarezs April 6, 2026 21:39
@dialvarezs
Copy link
Copy Markdown
Member

It seems that you forgot to push the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants