-
Notifications
You must be signed in to change notification settings - Fork 23
Description
The appropriate workflow for multi-allelic variants is to decompose and normalize them so that each REF/ALT combination produces a distinct record. This is a preprocessing step that will be done to the VCF before it is given to GQT. For example, consider the following multi-allelic record:
2 44101649 . G GC,GCC,GCCC,GCCCC 1/3:0,20,15,0,0:42:99:1528,741,642,703,0,777,1173,533,675,1117,1221,504,873,1158,1478
After decomposing and normalizing with vt, this will be split into 4 records.
2 44101649 . G GC 1/.:0,20,15,0,0:42:99:1528,741,642
2 44101649 . G GCC ./.:0,20,15,0,0:42:99:1528,703,777
2 44101649 . G GCCC ./1:0,20,15,0,0:42:99:1528,1173,1117
2 44101649 . G GCCCC ./.:0,20,15,0,0:42:99:1528,1221,1478
Notice that since the genotype in the original record is a heterozygote 1/3, the individual's genotype is a heterozygote for the first and third new record. GQT needs to recognize genotypes such as "1/." and "./1" as heterozygotes, not unknowns.
Now that GEMINI handles this decompsed VCFs (work from @brentp), adding this functionality will facilitate using GQT as the pre-processing engine underlying GEMINI.