Best practices for multi-experiment isoform quantification and resolving barcode collisions in single-cell Nanopore datasets

Hi IsoQuant Team,

I am seeking guidance on the best practices for quantifying isoform-level expression in a single-cell Nanopore experiment consisting of two distinct groups (Group_A and Group_B), each containing a single sample.

Data Preprocessing with scNanoGPS 2.0:
The input BAM files were pre-processed using the scNanoGPS 2.0 pipeline. Based on this specific curation logic:

Barcodes and UMIs are integrated into the Read ID and assigned to BAM tags (e.g., RG:Z), though these may represent specific processed identifiers defined by scNanoGPS 2.0 rather than raw sequences.

Each group used its own barcode whitelist. However, there is a minor overlap (collision) of 40 barcodes between the two whitelists across a total of ~12,000 detected cells.

Supporting Evidence (BAM Format):
To provide more context, here are the first few lines of my curated BAM file:

```
2eb5b75b-7bfb-4855-974f-235dda9d9e81_TAGAGATTCGTA   272     chr1    10477   0       33S165M1D49M565N2I315M69S       * 0       0       * * NM:i:20 ms:i:471   AS:i:434 nn:i:0 ts:A:-  tp:A:S  cm:i:108        s1:i:425        de:f:0.0358     rl:i:73 RG:Z:TCCGGGACACCTGCAG
dc9228ce-98c1-4d62-8655-fa24f2ea8c09_AAGATCGTAGTC   2064    chr1    10544   0       446H86M2D21M4H  * 0       0       AAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGC * NM:i:4  ms:i:97 AS:i:97 nn:i:0  tp:A:P  cm:i:17 s1:i:90 s2:i:96 de:f:0.0278     SA:Z:chr1,168131,+,108S445M12615D4S,22,64;      rl:i:0  RG:Z:CTCACTGAGCCTGTGC
7fcaf83d-7984-4a2d-8b27-9efdc989a26e_GTAATCCTCCTA   272     chr1    10569   0       1S73M1D49M171084N2I237M1I5M2D13M1I41M1D19M78S   * 0       0       * * NM:i:21     ms:i:379        AS:i:341        nn:i:0  ts:A:-  tp:A:S  cm:i:90 s1:i:338        de:f:0.0429     rl:i:84 RG:Z:AAAGTCCGTTGTACGT
```
Current Implementation:
I attempted to run IsoQuant on both samples simultaneously using the following command:

```
isoquant --reference reference.fa \
         --genedb annotation.gtf \
         --bam ./group_A_curated.bam ./group_B_curated.bam \
         --illumina_bam ./group_A_NGS.bam ./group_B_NGS.bam \
         --read_group tag:RG \
         --data_type nanopore \
         --complete_genedb \
         --sqanti_output \
         --threads 40 \
         --labels "Group_A" "Group_B" \
         --report_novel_unspliced true \
         -o ./isoquant_output
```
Observed Problem:
The resulting count matrix and barcodes.tsv do not distinguish between the two groups (e.g., no sample-specific prefixes for barcodes). This makes it impossible to accurately assign cells to their respective experimental conditions, especially for the overlapping barcodes.

Questions for the Developers:
To ensure a robust downstream differential isoform usage analysis, which of the following workflows do you recommend?

Workflow 1 (Consensus discovery then split): Run IsoQuant once with both BAMs to generate a comprehensive transcript_models.gtf (Joint Discovery), then re-run IsoQuant for each sample independently using this GTF as --genedb to ensure sample-specific barcode separation.

Workflow 2 (Single-run multi-grouping): Is there a parameter configuration to force sample-specific prefixes in the final matrix? 

Given the scNanoGPS 2.0 output shown above, what is the most reliable way for IsoQuant to interpret these identifiers while preserving the sample metadata in a single-cell context?

Thank you for your assistance and for this excellent tool!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for multi-experiment isoform quantification and resolving barcode collisions in single-cell Nanopore datasets #384

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Best practices for multi-experiment isoform quantification and resolving barcode collisions in single-cell Nanopore datasets #384

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions