This repository contains code used for analyses presented in the ENCODE-rE2G ENCODE companion paper, as well as links to the used benchmarking pipelines.
To apply pre-trained ENCODE-rE2G models or train new models, go to the ENCODE-rE2G model repository found here: https://github.com/EngreitzLab/ENCODE_rE2G.
To obtain a local copy of the repository, including the crispr_analyses/CRISPR_comparison
submodule containing the CRISPR benchmarking pipeline version used for analyses in this paper, open
a terminal and run:
git clone --recurse-submodules git@github.com:EngreitzLab/ENCODE-rE2G-Paper.gitThis should take less than a minute.
The following subdirectories contain code used to perform analyses shown in the paper:
Snakemake workflow for CRISPR benchmarking and other related analyses results shown in Main Figures 1, 2, 4 and 5, Extended Data Figures 1, 2, 3, 7, Supplementary Figures S1.1, S1.2, S7, S8, S9, S10, as well as Supplementary Tables S2, S9, S10, S14, S15, S16.
Code used to compute GWAS metrics shown in analyses of the manuscript. GWAS benchmarks were performed using the GWAS benchmarking pipeline (see below).
Code to process DNase-seq metadata from the ENCODE portal to select input files to generate ENCODE-rE2G predictions for 1,458 human DNase-seq experiments.
This subdirectory contains code used in other analyses across the manuscript, including annotating CRISPR data with chromatin categories, analyzing enhancer synergy and analyses related to 3D contact, enhancer-gene correlation and promoter classes.
Following pipelines were used to benchmark enhancer-gene predictions against CRISPR, eQTL and GWAS data:
A copy of the CRISPR benchmarking used to intersect predictions with CRISPR enhancer perturbation results can be found in the crispr_analyses subdirectory or in the original GitHib repository for general use: https://github.com/EngreitzLab/CRISPR_comparison.
All eQTL benchmarking analyses, including figures shown in the paper, were computed using the eQTL benchmarking pipeline available here: https://github.com/EngreitzLab/eQTLEnrichment.
The GWAS benchmarks were performed using the GWAS benchmarking pipeline available here: https://github.com/Deylab999MSKCC/e2g-benchmarking. Additional analysis code for GWAS-related analyses is available in the gwas_analyses subdirectory.