Skip to content

KumarLabJax/cfde-multiomics

Repository files navigation

LIF-Cachexia Multi-Omics Analysis Workshop

License: MIT

Workshop materials for the Methods for Multi-Omics Data Analysis Short Course at The Jackson Laboratory.

This course teaches a transferable analytical strategy for integrating public multi-omics data to generate mechanistic hypotheses. The anchor example is leukemia inhibitory factor (LIF) and its relationship to cachexia — but the approach applies to any disease-focused multi-omics question.

What's included

  • Cross-study integration demo — a complete pipeline from SRA download through RNA-seq GSEA and metabolomics enrichment, using the KPC pancreatic cancer mouse model (PRJNA773714 RNA-seq + ST003927 metabolomics)
  • GTEx live-coded session — hands-on Spearman correlation, GSEA, and biological interpretation using pre-processed GTEx expression data
  • Interpretation guides — GSEA, GO enrichment, Spearman correlation, and validation methodology

Repository layout

vignettes/               # Quarto source files (renders to docs/ for GitHub Pages)
  introduction/          # Workshop overview and learning path
  setup/                 # Environment setup guide
  gtex/                  # GTEx expression access, QC, and LIF analysis
  kpc-cross-study/       # KPC RNA-seq + metabolomics integration pipeline
  integration/           # Cross-dataset NES concordance and validation
  guides/                # GSEA, GO enrichment, Spearman, validation guides
  reference/             # Helper function reference

scripts/
  gtex/                  # GTEx metadata fetch, expression QC, GO enrichment
  sra/                   # SRA prefetch, tximport aggregation, Slurm submission
  install/               # Package installation (gtexr, MetaboAnalystR)
  dev/                   # Helper reference generation

config/
  gtex/                  # Gene list template and tissue preferences
  sra/                   # SRA accession lists and run tables
  analysis/              # GO keyword filters for pathway analysis

workshopr/               # R helper package shared across scripts and vignettes

Quickstart

# R packages (renv — first time only, ~20 min)
Rscript scripts/install/setup-renv.R

# Python packages
pip install -r requirements.txt

# Preview documentation
quarto preview vignettes/

On subsequent clones or after renv.lock changes:

renv::restore()
# Then re-run the two patch scripts (qs + MetaboAnalystR cannot auto-restore):
source("scripts/install/install-qs-shim.R")
source("scripts/install/install-metaboanalystr.R")

Preview the documentation site locally:

quarto preview vignettes/

Who this is for

  • Graduate students and postdocs new to multi-omics data analysis
  • Computational biologists wanting practical GSEA + metabolomics integration skills
  • Wet-lab researchers interested in generating mechanistic hypotheses from public data

Prerequisites: intermediate R (data frames, functions, pipes); basic gene expression concepts (counts, TPM, log transformation).

Data resources

  • GTEx v10 — Genotype-Tissue Expression project, healthy human transcriptome reference
  • BioProject PRJNA773714 — KPC mouse model RNA-seq (Dasgupta et al., J Exp Med 2025)
  • ST003927 — Metabolomics Workbench LC-MS/MS data, same experimental design
  • GSE133523 — Human skeletal muscle cachexia (GEO, pancreatic cancer vs. controls)
  • fgsea / msigdbr / MetaboAnalystR — open-source enrichment analysis tools

data/ directory layout

All files under data/ are gitignored — populate them by running the vignettes in order or the fetch scripts in scripts/.

data/
  gtex/
    expression-raw/        # .gct files downloaded from the GTEx portal
    metadata/              # Sample-level metadata (fetched via gtexr)
    expression-qc/         # QC tables and filtered expression matrices
    correlation/           # Spearman ρ outputs (LIF co-expression)
    enrichment/            # GO:BP GSEA results per tissue
    stats/                 # Summary statistics
    intermediary/          # Intermediate cached objects (.rds)

  geo/
    GSE133523/             # Human skeletal muscle cachexia raw data (GEO)

  metabolomics-workbench/
    mwtab_txt/             # Raw mwtab-format files (ST003927)
    intermediary/          # Parsed and normalised metabolomics objects
    plots/                 # QC and MSEA figures

  sra/
    PRJNA773714/           # KPC mouse RNA-seq (SRA prefetch output)
      salmon_quant/        # Per-sample Salmon quantification directories

  integrated/
    stats/                 # Cross-dataset NES concordance tables (.rds)
    intermediary/          # Merged multi-omics objects
    plots/                 # Integration figures

  reference/
    salmon_index_grcm39/   # Salmon index for GRCm39 (built once, reused)

  metaboanalystr/          # MetaboAnalystR session temp files (auto-generated)

To populate from scratch:

Data How to fetch
GTEx metadata scripts/gtex/gtex-metadata-fetch.R
GTEx expression .gct Download from gtexportal.orgdata/gtex/expression-raw/
PRJNA773714 RNA-seq scripts/sra/sra-prefetch.Rscripts/sra/slurm-sra-prefetch.sh (HPC) → scripts/sra/sra-tximport.R
ST003927 metabolomics Fetched automatically by the metabolomics QC vignette
GSE133523 Fetched automatically by the GSE133523 GSEA vignette

Developed at

The Jackson Laboratory, as part of the NIH Common Fund Data Ecosystem (CFDE).

About

▎ Workshop materials for integrating public multi-omics data (GTEx, RNA-seq, metabolomics) to generate mechanistic hypotheses, using LIF and cancer cachexia as a worked example.

Resources

License

Stars

Watchers

Forks

Contributors