Skip to content

Dptk2311/PanCancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

🧬 Pan-Cancer Classification using Multi-Omics Data

Intermediate Fusion with Transformers, Gating & Hierarchical Attention


📌 Overview

This project implements a deep learning-based multi-omics framework for pan-cancer classification using intermediate fusion techniques. It integrates multiple biological data modalities (mRNA, miRNA, CNV, methylation) and leverages:

  • Transformer-based encoders (intra-modality learning)
  • Learnable modality gating
  • Cross-attention (inter-modality fusion)
  • Hierarchical attention for robust feature integration

🧠 Key Features

  • Multi-omics data integration
  • Transformer-based feature extraction
  • Attention-based fusion mechanism
  • Learnable modality importance (gating)
  • Scalable architecture for biomedical datasets
  • Extendable for survival analysis

📂 Dataset Structure

Ensure the following files are present:

mrna.csv
mirna.csv
cnv.csv
methylation.csv
labelnum.csv


⚙️ Pipeline Architecture

1. Data Preprocessing

  • Load datasets
  • Normalize features
  • Handle missing values
  • Optional dimensionality reduction (PCA)

2. Feature Extraction

  • Transformer encoders applied to each modality
  • Captures intra-modality relationships

3. Modality Gating

  • Learns importance of each modality dynamically

4. Fusion (Hierarchical Attention)

  • Cross-attention across modalities
  • Combines representations into unified embedding

5. Classification

  • Fully connected layers
  • Softmax output for cancer classification

🏗️ Model Components

  • Transformer Encoder (per modality)
  • Multi-head Attention
  • Gating Mechanism
  • Cross-Modality Attention
  • Classification Head

🚀 How to Run

1. Clone Repository

git clone https://github.com/Dptk2311/PanCancer.git
cd pan-cancer-classification

2. Install Dependencies

pip install -r requirements.txt

3. Run Notebook

jupyter notebook pan_cancer_classification.ipynb


📊 Output

  • Classification accuracy
  • Loss curves
  • Modality importance (gating weights)
  • Learned feature embeddings

🔬 Research Contribution

  • Explores intermediate fusion in multi-omics learning
  • Uses attention-based integration across modalities
  • Applies transformers in bioinformatics tasks

📈 Future Work

  • Add proteomics / metabolomics data
  • Hyperparameter tuning
  • Model explainability (SHAP, attention maps)
  • Survival analysis (Cox model)
  • Deployment as clinical tool

🛠️ Tech Stack

  • Python
  • PyTorch
  • NumPy
  • Pandas
  • Scikit-learn

👨‍💻 Author

Diptak Chattopadhyay
Aniket Sahu


📜 License

This project is intended for academic and research purposes.

About

A deep learning framework for pan-cancer classification using multi-omics data with transformers, cross-attention, and modality gating.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors