This project implements a deep learning-based multi-omics framework for pan-cancer classification using intermediate fusion techniques. It integrates multiple biological data modalities (mRNA, miRNA, CNV, methylation) and leverages:
- Transformer-based encoders (intra-modality learning)
- Learnable modality gating
- Cross-attention (inter-modality fusion)
- Hierarchical attention for robust feature integration
- Multi-omics data integration
- Transformer-based feature extraction
- Attention-based fusion mechanism
- Learnable modality importance (gating)
- Scalable architecture for biomedical datasets
- Extendable for survival analysis
Ensure the following files are present:
mrna.csv
mirna.csv
cnv.csv
methylation.csv
labelnum.csv
- Load datasets
- Normalize features
- Handle missing values
- Optional dimensionality reduction (PCA)
- Transformer encoders applied to each modality
- Captures intra-modality relationships
- Learns importance of each modality dynamically
- Cross-attention across modalities
- Combines representations into unified embedding
- Fully connected layers
- Softmax output for cancer classification
- Transformer Encoder (per modality)
- Multi-head Attention
- Gating Mechanism
- Cross-Modality Attention
- Classification Head
git clone https://github.com/Dptk2311/PanCancer.git
cd pan-cancer-classification
pip install -r requirements.txt
jupyter notebook pan_cancer_classification.ipynb
- Classification accuracy
- Loss curves
- Modality importance (gating weights)
- Learned feature embeddings
- Explores intermediate fusion in multi-omics learning
- Uses attention-based integration across modalities
- Applies transformers in bioinformatics tasks
- Add proteomics / metabolomics data
- Hyperparameter tuning
- Model explainability (SHAP, attention maps)
- Survival analysis (Cox model)
- Deployment as clinical tool
- Python
- PyTorch
- NumPy
- Pandas
- Scikit-learn
Diptak Chattopadhyay
Aniket Sahu
This project is intended for academic and research purposes.