Skip to content

Cloufield/GWASTutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GWASTutorial

image

This tutorial provides hands-on training in Complex Trait Genomics for the course Basic Seminar II at The Laboratory of Complex Trait Genomics, University of Tokyo. See About for details. Questions or suggestions? Please use the Issue section.

What is GWAS?

Word Cloud

A Genome-Wide Association Study (GWAS) is a research approach that investigates the association between genetic variants (typically SNPs) and traits across the entire genome to discover genetic factors that contribute to complex traits and diseases.

Why Study GWAS and Statistical Genetics?

GWAS and statistical genetics are revolutionizing our understanding of human biology and medicine. These fields are fundamental to modern genetics research, enabling the discovery of genetic risk factors for common diseases, uncovering biological mechanisms, advancing personalized medicine through polygenic risk prediction, and identifying novel drug targets.

As genetic datasets grow exponentially and precision medicine gains widespread adoption, expertise in GWAS and statistical genetics is increasingly essential for researchers across genomics, medicine, public health, and biotechnology.

Study Aim

This tutorial aims to provide comprehensive, hands-on training in genome-wide association studies (GWAS) and complex trait genomics. Through practical exercises and detailed explanations, students will learn to:

  • Understand the fundamental concepts and methodologies of GWAS
  • Perform data quality control, association testing
  • Interpret and visualize GWAS results
  • Apply post-GWAS analyses including heritability estimation, fine-mapping, and polygenic risk scoring
  • Develop proficiency in the computational tools and statistical methods essential for modern genetic research

Contents

image
Category Topic Description
Introduction Introduction Essential background knowledge for understanding genome-wide association studies (GWAS) and complex trait genomics.
Command Line Tools - Linux Linux command line basics For those who haven't used the command line, we will first introduce the basics of the Linux system and commonly used commands.
Pre-GWAS 1000 Genomes Project Comprehensive catalog of human genetic variation providing reference data for GWAS and imputation.
Sample Dataset Sample dataset of 504 East Asian individuals from 1000 Genomes Project for tutorial exercises.
Data formats Before any analysis, the first thing is always to get familiar with your data. In this section, we will introduce some basic formats used to store sequence, genotype and dosage data.
Data QC Usually the raw genotype data is "dirty". This means that there are usually errors, invalid or missing values. In this section, we will learn how to perform quality control for the raw genotype data using PLINK.
Principal component analysis (PCA) In this section, we will cover how to perform Principal Component Analysis (PCA) to analyze the population structure.
Phasing Determining the haplotypes (parental chromosome origin) of genetic variants.
Imputation Predicting ungenotyped variants using reference panels and LD patterns.
GWAS Association tests After QC, we will perform the very first association tests for a simulated binary trait (case-control trait) with a logistic regression model using PLINK.
Visualization To visualize the summary statistics generated from association tests, we will use a python package called gwaslab to create Manhattan plots, Quantile-Quantile plots and Regional plots.
Linear mixed model (LMM) Statistical framework to account for population structure, cryptic relatedness, and confounding in GWAS.
Whole genome regression by REGENIE Computationally efficient whole-genome regression method for large-scale GWAS with multiple phenotypes.
Rare variant association tests Methods for testing associations of rare variants by aggregating information across variants in genes or regions.
Saddlepoint approximation (SAIGE) Accurate p-value calculation for binary traits with unbalanced case-control ratios using saddlepoint approximation.
Post-GWAS Variant Annotation by ANNOVAR/VEP Annotating genetic variants with functional information including gene location, consequence, and population frequency.
SNP-Heritability estimation by GCTA-GREML Estimating the proportion of phenotypic variance explained by all SNPs using linear mixed models.
LD score regression (univariate, cross-trait and partitioned) by LDSC Method to estimate heritability, genetic correlation, and cell-type specificity from GWAS summary statistics.
Gene / Gene-set analysis by MAGMA Testing associations at the gene and gene-set level by aggregating variant-level signals within genes.
Fine-mapping by SUSIE Identifying the most likely causal variant(s) within a genomic region showing significant association.
Meta-analysis Combining evidence from multiple GWAS studies to increase statistical power and improve effect size estimation.
Polygenic risk scores Calculating genetic risk scores by summing effect sizes of trait-associated variants weighted by their effects.
Mendelian randomization Using genetic variants as instrumental variables to infer causal relationships between exposures and outcomes.
Conditional analysis Identifying independent association signals within a locus by conditioning on lead variants.
Colocalization Testing whether two traits share the same causal variant in a genomic region to support causal inference.
TWAS Transcriptome-wide association study to identify genes whose expression is associated with traits using expression imputation.
Topics Linkage disequilibrium (LD) Non-random association of alleles at different loci, fundamental concept for understanding GWAS results.
Heritability Concepts Understanding how much phenotypic variation can be explained by genetic variation (broad-sense and narrow-sense heritability).
Power analysis for GWAS Calculating statistical power to detect associations given sample size, effect size, allele frequency, and significance threshold.
Winner's curse Systematic overestimation of genetic effect sizes when variants are selected based on significance thresholds.
Study design and phenotype definition Study design principles for case/control selection, covariates, trait transformations, and phenotype QC.
Relatedness and sample structure Identifying related samples, handling duplicates, and choosing family-based vs population GWAS designs.
Measure of effect Understanding different measures of genetic effect including odds ratio, relative risk, and hazard ratio.
Others Recommended reading Curated list of textbooks, review articles, and topic-specific papers for further learning.

© 2022 - 2026 GWASTutorial

About

GWAS Tutorial for Beginners

Topics

Resources

Stars

Watchers

Forks

Languages