Hi,
We are testing Chopstitch with known transcripts.
We are using Homo_sapiens.GRCh38.ensembl_94.cdna with transcripts > 200 bp and redundancy removed at 98% (cd-hit-est). The reference genome is NA12878 - reads.
This is the output that we have after FindSubcomponents.py
geneMap.tsv.gz
There is a huge cluster of 7429 transcripts grouped together. I attach the fasta file with those sequences:
largeCluster.fasta.gz
This looks a bit odd. Usually with every run of Chopstitch there is a very big cluster that groups together transcripts with very dissimilar sequences
Any clues where this can come from?
Thanks,
Hi,
We are testing Chopstitch with known transcripts.
We are using Homo_sapiens.GRCh38.ensembl_94.cdna with transcripts > 200 bp and redundancy removed at 98% (cd-hit-est). The reference genome is NA12878 - reads.
This is the output that we have after
FindSubcomponents.pygeneMap.tsv.gz
There is a huge cluster of 7429 transcripts grouped together. I attach the fasta file with those sequences:
largeCluster.fasta.gz
This looks a bit odd. Usually with every run of Chopstitch there is a very big cluster that groups together transcripts with very dissimilar sequences
Any clues where this can come from?
Thanks,