Skip to content

Add module: modkit/localize#11120

Open
sahuno wants to merge 7 commits intonf-core:masterfrom
sahuno:add-modkit-localize
Open

Add module: modkit/localize#11120
sahuno wants to merge 7 commits intonf-core:masterfrom
sahuno:add-modkit-localize

Conversation

@sahuno
Copy link
Copy Markdown

@sahuno sahuno commented Apr 5, 2026

New module: modkit/localize

Tool

modkit — A bioinformatics tool for working with modified bases from Oxford Nanopore sequencing data.

What this module does

MODKIT_LOCALIZE aggregates bedMethyl pileup counts around genomic features of interest (e.g. CpG islands, gene bodies, repeat elements), producing:

  • A TSV of percent-modification vs. offset from feature midpoints
  • An optional interactive HTML chart

This enables investigation of modification enrichment or depletion patterns relative to genomic landmarks, analogous to a methylation profile around features.

Inputs

  • bgzip-compressed + tabix-indexed bedMethyl file (from modkit pileup)
  • Genome sizes file (e.g. .fai or .sizes)
  • BED file of regions to localize around

Outputs

  • *.tsv — percent-modification vs. offset table
  • *.html — interactive chart
  • *.log — debug log

Tests

Tests chain MODKIT_PILEUPTABIX_TABIXMODKIT_LOCALIZE using existing nf-core test-datasets (homo_sapiens nanopore BAM + genome.fasta.fai). Regions BED is created inline — no new test data upload required.

  • nf-test passes locally (singularity profile)
  • nf-test passes on SLURM compute node (job 17362918, 2 tests PASSED in 13s)
  • nf-core modules lint passes: 48/48 tests, 0 warnings, 0 failures

Notes

  • --regions is required by the modkit 0.6.1 CLI (enforced at the tool level)
  • Container spec matches the existing modkit/pileup module
  • A companion modkit/localize/plot submodule for composite multi-sample visualization is planned as a follow-up PR

Adds MODKIT_LOCALIZE process to aggregate bedMethyl pileup counts
localized around genomic features of interest (e.g. CpG islands),
producing a TSV of percent-modification vs. offset from feature midpoints
and an optional interactive HTML chart.

Tests chain MODKIT_PILEUP -> TABIX_TABIX -> MODKIT_LOCALIZE using
existing nf-core test-datasets (homo_sapiens nanopore BAM + genome.fasta.fai).
Regions BED created inline; no new test data upload required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sahuno sahuno requested review from a team as code owners April 5, 2026 01:47
sahuno and others added 6 commits April 5, 2026 14:27
Oxford Nanopore dorado basecaller module with automatic model selection,
modified base calling (5mCG_5hmCG via combined model syntax), and optional
alignment to a reference genome. Uses local SIF for Track 1 (lab use);
Track 2 upstream blocked pending dorado bioconda package.

Tests: 2 stub (CPU) + 2 real GPU tests (NVIDIA L40S, componc_gpu_batch).
Uses --profile singularity,gpu to expose GPU via --nv flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a stub test using the public HG002 GIAB pod5 from nf-core/test-datasets
(PR nf-core/test-datasets#1968). Test references the file via
modules_testdata_base_path so it resolves once that PR merges.

Also commits the 2.2 MB 10-read pod5 subset locally for development.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thin wrapper over nanoporetech/dorado:sha... adding version labels.
Used by `wave freeze --dockerfile Dockerfile` to obtain a stable
community.wave.seqera.io URI for nf-core upstream submission.
See sandbox/TRACK2_wave_freeze.md for the full runbook.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nf-core/parabricks modules use nvcr.io/nvidia/... directly without bioconda.
Following same pattern: use nanoporetech/dorado:sha... from Docker Hub.
Removes the singularity/docker ternary — single container field only.
Local SIF override kept in tests/nextflow.config for MSKCC HPC testing.
Tracking semantic version tags: nanoporetech/dorado#1584.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Container: sahuno/dorado:1.4.0 (wraps nanoporetech/dorado v1.4.0 + samtools)
- meta.yml: document that dorado outputs SO:unknown; advise SAMTOOLS_SORT +
  SAMTOOLS_INDEX downstream (confirmed with GIAB HG002 10-read test on A100)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Composite visualization process for modkit localize results.
Overlays multiple samples colored by condition with 3 smoothing options
(loess, rolling_mean, binned). Produces PNG/PDF/SVG figures and combined TSVs.

- main.nf: MODKIT_LOCALIZE_PLOT process using Seqera Wave R container
- templates/plot_localize_composite.R: full R template with ggplot2
- environment.yml: r-base, r-data.table, r-ggplot2, r-zoo via conda-forge
- meta.yml: documents inputs/outputs per nf-core convention
- tests/: 4 nf-test cases (1 stub + 3 real: loess, rolling_mean, binned)
- tests/data/: synthetic 2-sample TSVs + samplesheet for testing

All 4 nf-tests pass with --profile conda.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant