SpatialPPIv2 is an advanced graph-neural-network-based model that predicts PPIs by using large language models to embed sequence features and Graph Attention Networks to capture structural information.
Input protein structures (PDB files) were used to extract structural and sequence features. The distance matrix within proteins was used to construct the edges in the protein graph representation. The protein sequence information encoded by language models was used as the features of nodes in the graph representation. The distance matrix within proteins was used to construct the edges in the protein graph representation. The protein sequence information encoded by language models was used as the node features in the graph representation. The residues between two proteins are fully connected to increase the message passing between proteins. The overall graph features of protein pairs were obtained by chain mean pooling the protein features directly obtained from language models and the relationship features output by GAT and used to calculate the possibility of interaction between proteins.
SpatialPPIv2 support multiple environment configuration methods, including docker, apptainer and conda.
You can find the environment file under env folder.
apptainer build -s --nv ${CONTAINER_PATH} ./env/Apptainer
docker build -t ${IMAGE_NAME} ./env
conda create -n pytorch221 python=3.11
conda activate pytorch221
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install numpy pandas seaborn tensorboard
pip install torch_geometric==2.5.1
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.2.0+cu118.html
conda install lightning -c conda-forge
pip install lightning[extra]
pip install matplotlib==3.8.3
pip install biopandas==0.5.1
pip install biopython==1.83
pip install transformers==4.40.2
pip install sentencepiece==0.2.0
pip install torchsummary==1.5.1
pip install scipy==1.12.0
pip install torch_cluster -f https://data.pyg.org/whl/torch-2.2.0+cu118.html
pip install git+https://github.com/yusuf1759/prodigy-cryst.git
pip install pinder[all]
pip install fair-esm
python inference.py --A demo/P33895.pdb --B demo/P40460.pdb
Options:
--A: Path to thepdb/ciffile--chain_A: The chain ID of the first protein. By default, the first chain in the file is used.--B: Path to thepdb/ciffile--chain_B: The chain ID of the first protein. By default, the first chain in the file is used.--model: The model to use. Forpdb/ciffile, this will beProtT5andRostlab/prot_t5_xl_uniref50embedding will be used.--device: Eithercudaorcpu. The device to use.
When there is no suitable protein structure file, fasta files can be used to make inferences based on protein sequences. This model is based on the attention contact of ESM-2. If the input file is in fasta format, the esm2_t33_650M_UR50D model will be forced to be used. Example:
python inference.py --A demo/D3INY1.fasta --B demo/P62593.fasta
Please check the notebook.
To replicate our research, please follow these steps:
You can refer to here
export PINDER_RELEASE=2024-02
export PINDER_BASE_DIR=~/my-custom-location-for-pinder/pinder
pinder_download
Change data_root to ~/my-custom-location-for-pinder/pinder/2024-02/pdbs
wget -O datasets/BIOGRID-ALL-4.4.238.tab3.zip https://downloads.thebiogrid.org/Download/BioGRID/Release-Archive/BIOGRID-4.4.238/BIOGRID-ALL-4.4.238.tab3.zip
unzip -d datasets/ datasets/BIOGRID-ALL-4.4.238.tab3.zip
Notice: Do not use exclude for test set.
python scripts/dataset_generator.py --exclude scripts/exclude.txt --split train --biogrid datasets/BIOGRID-ALL-4.4.238.tab3.txt
This will speed up training process
python scripts/calculate_embedding.py --split train --workers 8 --saveroot datasets/train
After the calculation, edit the config/default.yaml config file accordingly.
- Change the
pathto the used above - Change the
typetoondisk
The training log and model weights will be saved in lightning_logs
python main.py --task train
python main.py --task eval --checkpoint <path to ckpt checkpoint> --output pred.npy
- Hu W, Ohue M. SpatialPPIv2: Enhancing protein–protein interaction prediction through graph neural networks with protein language models. Computational and Structural Biotechnology Journal, 27: 508-518, 2025. doi: 10.1016/j.csbj.2025.01.022
