pairwise-colocalization-smk

Purpose

This pipeline performs pairwise colocalization analysis between genetic signals from different GWAS summary statistics using the coloc R package. It identifies pairs of signals that are close together, extracts the relevant regions from summary statistics, and runs colocalization for each pair.

This pipeline was originally written as a set of array jobs, and is adapted from the original pipeline, which can be viewed under the slurm/ directory.

Overview

The workflow is managed by Snakemake and is modular, allowing for scalable and reproducible analysis of many signal pairs.

Inputs

Input signals file: CSV file with columns:
- chromosome: Chromosome number
- position: Genomic position (base pair)
- gwas: Name of the GWAS (should match summary stats file prefix)
- (other columns allowed, but not required)
- Example: test/data/input_signals.csv
Summary statistics files: One file per GWAS, named {gwas}.txt and placed in the directory specified in workflow/config.yaml (default: test/data/sumstats/). Each file should be tab-delimited and contain at least:
- chromosome
- position
- rsid
- beta
- se

Outputs

outputs/pairs_to_test.csv: List of all signal pairs to be tested
outputs/extracted_regions/: Extracted region files for each pair and GWAS
outputs/coloc_results/: Colocalization results for each pair
outputs/coloc_credible_sets/: Credible set (if applicable) for each result
outputs/joined_credible_sets/: For colocalized signals, credible sets are joined

Requirements

Snakemake (v6+ recommended)
R (v4+ recommended)
R packages: coloc, dplyr, readr, tidyr

Usage

Configure your input files
- Place your input signals CSV in test/data/input_signals.csv (or update the path in workflow/config.yaml)
- Place your summary statistics files in test/data/sumstats/ (or update the path in workflow/config.yaml)
- If you do not make any changes, then the pipeline will run on the test dataset
Edit configuration
- Edit workflow/config.yaml to set the correct paths and distance threshold (in base pairs)
Run the pipeline
```
snakemake --cores 4 all
```
(Adjust the number of cores as needed)
View results
- Colocalization results for each pair will be in test/outputs/coloc_results/

Pipeline Steps

find_pairs: Identifies all pairs of signals within the specified distance on the same chromosome, from different GWAS.
extract_regions: For each pair, extracts the relevant region from each GWAS summary stats file.
run_coloc: Runs the coloc R package on each pair of extracted regions and outputs the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pairwise-colocalization-smk

Purpose

Overview

Inputs

Outputs

Requirements

Usage

Pipeline Steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
config		config
slurm		slurm
test		test
workflow		workflow
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

License

simpsondl/pairwise-colocalization-smk

Folders and files

Latest commit

History

Repository files navigation

pairwise-colocalization-smk

Purpose

Overview

Inputs

Outputs

Requirements

Usage

Pipeline Steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages