Very large set of transcripts grouped in one splice graph

Hi,

We are testing Chopstitch with known transcripts. 

We are using _Homo_sapiens.GRCh38.ensembl_94.cdna_ with transcripts > 200 bp and redundancy removed at 98% (cd-hit-est). The reference genome is NA12878 - reads.
This is the output that we have after ```FindSubcomponents.py```
[geneMap.tsv.gz](https://github.com/bcgsc/ChopStitch/files/2721625/geneMap.tsv.gz) 

There is a huge cluster of 7429 transcripts grouped together. I attach the fasta file with those sequences:
[largeCluster.fasta.gz](https://github.com/bcgsc/ChopStitch/files/2721647/largeCluster.fasta.gz)

This looks a bit odd. Usually with every run of Chopstitch there is a very big cluster that groups together transcripts with very dissimilar sequences  
Any clues where this can come from?
Thanks, 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very large set of transcripts grouped in one splice graph #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Very large set of transcripts grouped in one splice graph #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions