rawDNA2vcf

Scripts to convert raw DNA data (from 23AndMe, etc) to VCF

The generated support files for 23AndMe v4 are included in this repo. The scripts here should be able to convert AncestryDNA and FTdna to 23AndMe format and generate the support files. The entire workflow has only been tested for 23AndMe v4.

Workflow:

Use make_filter.py to create a filter that lists all tested markers at a given vendor. It takes a list of raw data files in 23AndMe format. The point of using multiple data files is to make sure that nothing is missing. filter/23andme_v4.tsv contains a list of all markers tested by 23AndMe v4. It was generated by make_filter.py
Use make_map.py to make a mapping file between the vendor's IDs and positions and what's used in dbSNP. It compares the above mentioned filter file to a VCF that lists ALL markers in dbSNP. map/23andme_v4.map was generated using GRCh37p13 b150 (WARNING: 7GB download - ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/All_20170710.vcf.gz)
Use make_template.py to build a template that's used to convert 23AndMe formatted data files to VCF. The template is essentially just a master VCF that was filtered to only contain the needed lines. This way, we only need to parse a ~30MB file instead of a ~7GB file. template/23andme_v4.vcf.gz was generated from GRCh37p13 b150
Use 23andme_to_vcf.py to convert the 23AndMe format raw data file to VCF with the help of the template and mapping files made in 2 and 3.

make_filter.py [Sample1] [Sample2] [Sample3] ... [SampleN]

ls genome*.txt | xargs -n 1 ./make_filter.py > filter/23andme_v4.tsv

Look at a bunch of raw data files and make a list of all variants that are reported. Not sure if multiple files are actually needed. The thought was that some files might not have all the tested variants (like if they weren't called).

make_map.py [outputMapFile] [filterFile] [snpListVCF]

./make_map.py map/23andme_v4.map filter/23andme_v4.tsv All_20170710.vcf.gz

Make a file that maps between 23AndMe rsid's and dbSNP rsid's

make_template.py [outputTemplateFile] [mapFile] [snpListVCF]

./make_template.py template/23andme_v4.vcf map/23andme_v4.map All_20170710.vcf.gz; bgzip template/23andme_v4.vcf; tabix template/23andme_v4.vcf.gz

Filter the giant dbSNP VCF into something that's easier to handle. Only grab the entries we need for 23AndMe

23andme_to_vcf.py [23andmeInputFile] [outputSampleName] [mapFile] [templateFile] [VCFoutputFile]

./23andme_to_vcf.py genome_Philip_Baltar.txt Philip_Baltar map/23andme_v4.map template/23andme_v4.vcf.gz Philip_Baltar.vcf

This is the main script that does all the magic.

AncestryDNA_to_23andme.awk [inputfile]

./AncestryDNA_to_23andme.awk data.txt > AncestryDNA_23andme.txt

Script to convert AncestryDNA files to 23AndMe format

FTdna_to_23andme.awk [inputfile]

./FTdna_to_23andme.awk FTdna.csv > FTdna_23andme.txt

Script to convert AncestryDNA files to 23AndMe format

This work was inspired by Giulio Genovese (http://apol1.blogspot.com/2013/08/impute-apoe-and-apol1-with-23andme.html)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rawDNA2vcf

Workflow:

make_filter.py [Sample1] [Sample2] [Sample3] ... [SampleN]

make_map.py [outputMapFile] [filterFile] [snpListVCF]

make_template.py [outputTemplateFile] [mapFile] [snpListVCF]

23andme_to_vcf.py [23andmeInputFile] [outputSampleName] [mapFile] [templateFile] [VCFoutputFile]

AncestryDNA_to_23andme.awk [inputfile]

FTdna_to_23andme.awk [inputfile]

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
filter		filter
map		map
template		template
23andme_to_vcf.py		23andme_to_vcf.py
AncestryDNA_to_23andme.awk		AncestryDNA_to_23andme.awk
FTdna_to_23andme.awk		FTdna_to_23andme.awk
LICENSE		LICENSE
README.md		README.md
make_filter.py		make_filter.py
make_map.py		make_map.py
make_template.py		make_template.py
vcflib.py		vcflib.py

License

psbaltar/rawDNA2vcf

Folders and files

Latest commit

History

Repository files navigation

rawDNA2vcf

Workflow:

make_filter.py [Sample1] [Sample2] [Sample3] ... [SampleN]

make_map.py [outputMapFile] [filterFile] [snpListVCF]

make_template.py [outputTemplateFile] [mapFile] [snpListVCF]

23andme_to_vcf.py [23andmeInputFile] [outputSampleName] [mapFile] [templateFile] [VCFoutputFile]

AncestryDNA_to_23andme.awk [inputfile]

FTdna_to_23andme.awk [inputfile]

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages