This tutorial provides hands-on training in Complex Trait Genomics for the course Basic Seminar II at The Laboratory of Complex Trait Genomics, University of Tokyo. See About for details. Questions or suggestions? Please use the Issue section.
A Genome-Wide Association Study (GWAS) is a research approach that investigates the association between genetic variants (typically SNPs) and traits across the entire genome to discover genetic factors that contribute to complex traits and diseases.
GWAS and statistical genetics are revolutionizing our understanding of human biology and medicine. These fields are fundamental to modern genetics research, enabling the discovery of genetic risk factors for common diseases, uncovering biological mechanisms, advancing personalized medicine through polygenic risk prediction, and identifying novel drug targets.
As genetic datasets grow exponentially and precision medicine gains widespread adoption, expertise in GWAS and statistical genetics is increasingly essential for researchers across genomics, medicine, public health, and biotechnology.
This tutorial aims to provide comprehensive, hands-on training in genome-wide association studies (GWAS) and complex trait genomics. Through practical exercises and detailed explanations, students will learn to:
- Understand the fundamental concepts and methodologies of GWAS
- Perform data quality control, association testing
- Interpret and visualize GWAS results
- Apply post-GWAS analyses including heritability estimation, fine-mapping, and polygenic risk scoring
- Develop proficiency in the computational tools and statistical methods essential for modern genetic research
| Category | Topic | Description |
|---|---|---|
| Introduction | Introduction | Essential background knowledge for understanding genome-wide association studies (GWAS) and complex trait genomics. |
| Command Line Tools - Linux | Linux command line basics | For those who haven't used the command line, we will first introduce the basics of the Linux system and commonly used commands. |
| Pre-GWAS | 1000 Genomes Project | Comprehensive catalog of human genetic variation providing reference data for GWAS and imputation. |
| Sample Dataset | Sample dataset of 504 East Asian individuals from 1000 Genomes Project for tutorial exercises. | |
| Data formats | Before any analysis, the first thing is always to get familiar with your data. In this section, we will introduce some basic formats used to store sequence, genotype and dosage data. | |
| Data QC | Usually the raw genotype data is "dirty". This means that there are usually errors, invalid or missing values. In this section, we will learn how to perform quality control for the raw genotype data using PLINK. | |
| Principal component analysis (PCA) | In this section, we will cover how to perform Principal Component Analysis (PCA) to analyze the population structure. | |
| Phasing | Determining the haplotypes (parental chromosome origin) of genetic variants. | |
| Imputation | Predicting ungenotyped variants using reference panels and LD patterns. | |
| GWAS | Association tests | After QC, we will perform the very first association tests for a simulated binary trait (case-control trait) with a logistic regression model using PLINK. |
| Visualization | To visualize the summary statistics generated from association tests, we will use a python package called gwaslab to create Manhattan plots, Quantile-Quantile plots and Regional plots. | |
| Linear mixed model (LMM) | Statistical framework to account for population structure, cryptic relatedness, and confounding in GWAS. | |
| Whole genome regression by REGENIE | Computationally efficient whole-genome regression method for large-scale GWAS with multiple phenotypes. | |
| Rare variant association tests | Methods for testing associations of rare variants by aggregating information across variants in genes or regions. | |
| Saddlepoint approximation (SAIGE) | Accurate p-value calculation for binary traits with unbalanced case-control ratios using saddlepoint approximation. | |
| Post-GWAS | Variant Annotation by ANNOVAR/VEP | Annotating genetic variants with functional information including gene location, consequence, and population frequency. |
| SNP-Heritability estimation by GCTA-GREML | Estimating the proportion of phenotypic variance explained by all SNPs using linear mixed models. | |
| LD score regression (univariate, cross-trait and partitioned) by LDSC | Method to estimate heritability, genetic correlation, and cell-type specificity from GWAS summary statistics. | |
| Gene / Gene-set analysis by MAGMA | Testing associations at the gene and gene-set level by aggregating variant-level signals within genes. | |
| Fine-mapping by SUSIE | Identifying the most likely causal variant(s) within a genomic region showing significant association. | |
| Meta-analysis | Combining evidence from multiple GWAS studies to increase statistical power and improve effect size estimation. | |
| Polygenic risk scores | Calculating genetic risk scores by summing effect sizes of trait-associated variants weighted by their effects. | |
| Mendelian randomization | Using genetic variants as instrumental variables to infer causal relationships between exposures and outcomes. | |
| Conditional analysis | Identifying independent association signals within a locus by conditioning on lead variants. | |
| Colocalization | Testing whether two traits share the same causal variant in a genomic region to support causal inference. | |
| TWAS | Transcriptome-wide association study to identify genes whose expression is associated with traits using expression imputation. | |
| Topics | Linkage disequilibrium (LD) | Non-random association of alleles at different loci, fundamental concept for understanding GWAS results. |
| Heritability Concepts | Understanding how much phenotypic variation can be explained by genetic variation (broad-sense and narrow-sense heritability). | |
| Power analysis for GWAS | Calculating statistical power to detect associations given sample size, effect size, allele frequency, and significance threshold. | |
| Winner's curse | Systematic overestimation of genetic effect sizes when variants are selected based on significance thresholds. | |
| Study design and phenotype definition | Study design principles for case/control selection, covariates, trait transformations, and phenotype QC. | |
| Relatedness and sample structure | Identifying related samples, handling duplicates, and choosing family-based vs population GWAS designs. | |
| Measure of effect | Understanding different measures of genetic effect including odds ratio, relative risk, and hazard ratio. | |
| Others | Recommended reading | Curated list of textbooks, review articles, and topic-specific papers for further learning. |
© 2022 - 2026 GWASTutorial

