This pipeline performs pairwise colocalization analysis between genetic signals from different GWAS summary statistics using the coloc R package. It identifies pairs of signals that are close together, extracts the relevant regions from summary statistics, and runs colocalization for each pair.
This pipeline was originally written as a set of array jobs, and is adapted from the original pipeline, which can be viewed under the slurm/ directory.
The workflow is managed by Snakemake and is modular, allowing for scalable and reproducible analysis of many signal pairs.
-
Input signals file: CSV file with columns:
chromosome: Chromosome numberposition: Genomic position (base pair)gwas: Name of the GWAS (should match summary stats file prefix)- (other columns allowed, but not required)
- Example:
test/data/input_signals.csv
-
Summary statistics files: One file per GWAS, named
{gwas}.txtand placed in the directory specified inworkflow/config.yaml(default:test/data/sumstats/). Each file should be tab-delimited and contain at least:chromosomepositionrsidbetase
outputs/pairs_to_test.csv: List of all signal pairs to be testedoutputs/extracted_regions/: Extracted region files for each pair and GWASoutputs/coloc_results/: Colocalization results for each pairoutputs/coloc_credible_sets/: Credible set (if applicable) for each resultoutputs/joined_credible_sets/: For colocalized signals, credible sets are joined
-
Configure your input files
- Place your input signals CSV in
test/data/input_signals.csv(or update the path inworkflow/config.yaml) - Place your summary statistics files in
test/data/sumstats/(or update the path inworkflow/config.yaml) - If you do not make any changes, then the pipeline will run on the test dataset
- Place your input signals CSV in
-
Edit configuration
- Edit
workflow/config.yamlto set the correct paths and distance threshold (in base pairs)
- Edit
-
Run the pipeline
snakemake --cores 4 all
(Adjust the number of cores as needed)
-
View results
- Colocalization results for each pair will be in
test/outputs/coloc_results/
- Colocalization results for each pair will be in
- find_pairs: Identifies all pairs of signals within the specified distance on the same chromosome, from different GWAS.
- extract_regions: For each pair, extracts the relevant region from each GWAS summary stats file.
- run_coloc: Runs the coloc R package on each pair of extracted regions and outputs the results.