Skip to content

simpsondl/pairwise-colocalization-smk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pairwise-colocalization-smk

Snakemake License: MIT

Purpose

This pipeline performs pairwise colocalization analysis between genetic signals from different GWAS summary statistics using the coloc R package. It identifies pairs of signals that are close together, extracts the relevant regions from summary statistics, and runs colocalization for each pair.

This pipeline was originally written as a set of array jobs, and is adapted from the original pipeline, which can be viewed under the slurm/ directory.

Overview

The workflow is managed by Snakemake and is modular, allowing for scalable and reproducible analysis of many signal pairs.

Inputs

  • Input signals file: CSV file with columns:

    • chromosome: Chromosome number
    • position: Genomic position (base pair)
    • gwas: Name of the GWAS (should match summary stats file prefix)
    • (other columns allowed, but not required)
    • Example: test/data/input_signals.csv
  • Summary statistics files: One file per GWAS, named {gwas}.txt and placed in the directory specified in workflow/config.yaml (default: test/data/sumstats/). Each file should be tab-delimited and contain at least:

    • chromosome
    • position
    • rsid
    • beta
    • se

Outputs

  • outputs/pairs_to_test.csv: List of all signal pairs to be tested
  • outputs/extracted_regions/: Extracted region files for each pair and GWAS
  • outputs/coloc_results/: Colocalization results for each pair
  • outputs/coloc_credible_sets/: Credible set (if applicable) for each result
  • outputs/joined_credible_sets/: For colocalized signals, credible sets are joined

Requirements

  • Snakemake (v6+ recommended)
  • R (v4+ recommended)
  • R packages: coloc, dplyr, readr, tidyr

Usage

  1. Configure your input files

    • Place your input signals CSV in test/data/input_signals.csv (or update the path in workflow/config.yaml)
    • Place your summary statistics files in test/data/sumstats/ (or update the path in workflow/config.yaml)
    • If you do not make any changes, then the pipeline will run on the test dataset
  2. Edit configuration

    • Edit workflow/config.yaml to set the correct paths and distance threshold (in base pairs)
  3. Run the pipeline

    snakemake --cores 4 all

    (Adjust the number of cores as needed)

  4. View results

    • Colocalization results for each pair will be in test/outputs/coloc_results/

Pipeline Steps

  1. find_pairs: Identifies all pairs of signals within the specified distance on the same chromosome, from different GWAS.
  2. extract_regions: For each pair, extracts the relevant region from each GWAS summary stats file.
  3. run_coloc: Runs the coloc R package on each pair of extracted regions and outputs the results.

About

Configurable pairwise colocalization pipeline using coloc

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published