Skip to content

pileUp and dmr showing significant results for m6A even with very low amount of mapped reads from Dorado's modBAM #618

@eltonjrv

Description

@eltonjrv

Dear modkit developers,

I've been facing this curious issue with m6A pileup) and differential methylation (dmr), and would like to hear your opinion on whether or not to trust on results:

I'm analysing m6A on a viral RNAs in a timeseries designh (0h, 24h, and 48h, of infection). On the 0h timepoint, expectedly, I've got very few reads aligned to the virus ref genome ( ~3k mapped reads only).
I know that modkit pileup recommends an N sampling of 10-50k reads for performing that "pass" threshold calculation (and I use -n 50000 on my runs), but it does still seem to run well even for these 0h samples like 0h_rep1 (~2.1k reads mapped) and 0h_rep3 (~4.8k reads mapped), finding a confidence 'pass' threshold over 80%.
Then, modkit dmr also seems to run well (on both 0h-24h and 0h-vs-48h), even though warning on a great amount of failed sites due to "missing-in-one-condition", which is the 0h one.

After filtering for pval < 0.05 (column 15 from the dmr bedMethyl), I get tens of differentially modified m6A sites, with an effect size (column) bias towards the 0h sample group. All sites are highly differentially modified in the 0h group.

Shall I trust these results? Or it's just a statistical bias from the little input size (%mapped reads) of the 0h sample.

JFYI: 24h and 48h have several hundreds thousands (200-700k) mapped reds against the same genome. And it's dmr bedMethyl file displays a good balance of effect sizes (some more modified in the 24h, and some more in the 48h).

Thanks,
Best,
Elton

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions