Hi IsoQuant Team,
I am seeking guidance on the best practices for quantifying isoform-level expression in a single-cell Nanopore experiment consisting of two distinct groups (Group_A and Group_B), each containing a single sample.
Data Preprocessing with scNanoGPS 2.0:
The input BAM files were pre-processed using the scNanoGPS 2.0 pipeline. Based on this specific curation logic:
Barcodes and UMIs are integrated into the Read ID and assigned to BAM tags (e.g., RG:Z), though these may represent specific processed identifiers defined by scNanoGPS 2.0 rather than raw sequences.
Each group used its own barcode whitelist. However, there is a minor overlap (collision) of 40 barcodes between the two whitelists across a total of ~12,000 detected cells.
Supporting Evidence (BAM Format):
To provide more context, here are the first few lines of my curated BAM file:
2eb5b75b-7bfb-4855-974f-235dda9d9e81_TAGAGATTCGTA 272 chr1 10477 0 33S165M1D49M565N2I315M69S * 0 0 * * NM:i:20 ms:i:471 AS:i:434 nn:i:0 ts:A:- tp:A:S cm:i:108 s1:i:425 de:f:0.0358 rl:i:73 RG:Z:TCCGGGACACCTGCAG
dc9228ce-98c1-4d62-8655-fa24f2ea8c09_AAGATCGTAGTC 2064 chr1 10544 0 446H86M2D21M4H * 0 0 AAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGGCGC * NM:i:4 ms:i:97 AS:i:97 nn:i:0 tp:A:P cm:i:17 s1:i:90 s2:i:96 de:f:0.0278 SA:Z:chr1,168131,+,108S445M12615D4S,22,64; rl:i:0 RG:Z:CTCACTGAGCCTGTGC
7fcaf83d-7984-4a2d-8b27-9efdc989a26e_GTAATCCTCCTA 272 chr1 10569 0 1S73M1D49M171084N2I237M1I5M2D13M1I41M1D19M78S * 0 0 * * NM:i:21 ms:i:379 AS:i:341 nn:i:0 ts:A:- tp:A:S cm:i:90 s1:i:338 de:f:0.0429 rl:i:84 RG:Z:AAAGTCCGTTGTACGT
Current Implementation:
I attempted to run IsoQuant on both samples simultaneously using the following command:
isoquant --reference reference.fa \
--genedb annotation.gtf \
--bam ./group_A_curated.bam ./group_B_curated.bam \
--illumina_bam ./group_A_NGS.bam ./group_B_NGS.bam \
--read_group tag:RG \
--data_type nanopore \
--complete_genedb \
--sqanti_output \
--threads 40 \
--labels "Group_A" "Group_B" \
--report_novel_unspliced true \
-o ./isoquant_output
Observed Problem:
The resulting count matrix and barcodes.tsv do not distinguish between the two groups (e.g., no sample-specific prefixes for barcodes). This makes it impossible to accurately assign cells to their respective experimental conditions, especially for the overlapping barcodes.
Questions for the Developers:
To ensure a robust downstream differential isoform usage analysis, which of the following workflows do you recommend?
Workflow 1 (Consensus discovery then split): Run IsoQuant once with both BAMs to generate a comprehensive transcript_models.gtf (Joint Discovery), then re-run IsoQuant for each sample independently using this GTF as --genedb to ensure sample-specific barcode separation.
Workflow 2 (Single-run multi-grouping): Is there a parameter configuration to force sample-specific prefixes in the final matrix?
Given the scNanoGPS 2.0 output shown above, what is the most reliable way for IsoQuant to interpret these identifiers while preserving the sample metadata in a single-cell context?
Thank you for your assistance and for this excellent tool!
Hi IsoQuant Team,
I am seeking guidance on the best practices for quantifying isoform-level expression in a single-cell Nanopore experiment consisting of two distinct groups (Group_A and Group_B), each containing a single sample.
Data Preprocessing with scNanoGPS 2.0:
The input BAM files were pre-processed using the scNanoGPS 2.0 pipeline. Based on this specific curation logic:
Barcodes and UMIs are integrated into the Read ID and assigned to BAM tags (e.g., RG:Z), though these may represent specific processed identifiers defined by scNanoGPS 2.0 rather than raw sequences.
Each group used its own barcode whitelist. However, there is a minor overlap (collision) of 40 barcodes between the two whitelists across a total of ~12,000 detected cells.
Supporting Evidence (BAM Format):
To provide more context, here are the first few lines of my curated BAM file:
Current Implementation:
I attempted to run IsoQuant on both samples simultaneously using the following command:
Observed Problem:
The resulting count matrix and barcodes.tsv do not distinguish between the two groups (e.g., no sample-specific prefixes for barcodes). This makes it impossible to accurately assign cells to their respective experimental conditions, especially for the overlapping barcodes.
Questions for the Developers:
To ensure a robust downstream differential isoform usage analysis, which of the following workflows do you recommend?
Workflow 1 (Consensus discovery then split): Run IsoQuant once with both BAMs to generate a comprehensive transcript_models.gtf (Joint Discovery), then re-run IsoQuant for each sample independently using this GTF as --genedb to ensure sample-specific barcode separation.
Workflow 2 (Single-run multi-grouping): Is there a parameter configuration to force sample-specific prefixes in the final matrix?
Given the scNanoGPS 2.0 output shown above, what is the most reliable way for IsoQuant to interpret these identifiers while preserving the sample metadata in a single-cell context?
Thank you for your assistance and for this excellent tool!