Skip to content

Need to handle multiallelic variants that have been decomposed and normalized. #23

@arq5x

Description

@arq5x

The appropriate workflow for multi-allelic variants is to decompose and normalize them so that each REF/ALT combination produces a distinct record. This is a preprocessing step that will be done to the VCF before it is given to GQT. For example, consider the following multi-allelic record:

2   44101649    .   G   GC,GCC,GCCC,GCCCC   1/3:0,20,15,0,0:42:99:1528,741,642,703,0,777,1173,533,675,1117,1221,504,873,1158,1478

After decomposing and normalizing with vt, this will be split into 4 records.

2   44101649    .   G   GC  1/.:0,20,15,0,0:42:99:1528,741,642
2   44101649    .   G   GCC ./.:0,20,15,0,0:42:99:1528,703,777
2   44101649    .   G   GCCC    ./1:0,20,15,0,0:42:99:1528,1173,1117
2   44101649    .   G   GCCCC   ./.:0,20,15,0,0:42:99:1528,1221,1478

Notice that since the genotype in the original record is a heterozygote 1/3, the individual's genotype is a heterozygote for the first and third new record. GQT needs to recognize genotypes such as "1/." and "./1" as heterozygotes, not unknowns.

Now that GEMINI handles this decompsed VCFs (work from @brentp), adding this functionality will facilitate using GQT as the pre-processing engine underlying GEMINI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions