CrossFusion: A Multi-Scale Cross-Attention Convolutional Fusion Model for Cancer Survival Prediction
[arXiv]
Rustin Soraki, Huayu Wang, Joann G. Elmore, Linda Shapiro
@misc{soraki2025crossfusionmultiscalecrossattentionconvolutional,
title={CrossFusion: A Multi-Scale Cross-Attention Convolutional Fusion Model for Cancer Survival Prediction},
author={Rustin Soraki and Huayu Wang and Joann G. Elmore and Linda Shapiro},
year={2025},
eprint={2503.02064},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2503.02064},
}Abstract: Cancer survival prediction from whole slide images (WSIs) is a challenging task in computational pathology due to the large size, irregular shape, and high granularity of the WSIs. These characteristics make it difficult to capture the full spectrum of patterns, from subtle cellular abnormalities to complex tissue interactions, which are crucial for accurate prognosis. To address this, we propose CrossFusion—a novel, multi-scale feature integration framework that extracts and fuses information from patches across different magnification levels. By effectively modeling both scale-specific patterns and their interactions, CrossFusion generates a rich feature set that enhances survival prediction accuracy. We validate our approach across six cancer types from public datasets, demonstrating significant improvements over existing state-of- the-art methods. Moreover, when coupled with domain-specific feature extraction backbones, our method shows further gains in prognostic performance compared to general-purpose backbones
If you wish to install and run the code in this repository, use the provided environment.yml file to set up the necessary environment:
git clone https://github.com/RustinS/CrossFusion.git
cd CrossFusionconda env create -f environment.ymlconda activate crossfusionTo download diagnostic WSIs (formatted as .svs files), please refer to the NIH Genomic Data Commons Data Portal. We used WSIs from these studies: BLCA, BRCA, COAD, LUAD, GB&LGG, and UCEC. WSIs for each cancer type can be downloaded using the GDC Data Transfer Tool.
We use the methodology outlined in CLAM to create and embed patches from whole-slide images (WSIs). The patches are extracted at a fixed size of 256×256 pixels from three magnification levels: 20x, 10x, and 5x.
Since WSIs from the TCGA dataset have varying maximum magnifications, the actual magnification corresponding to each resolution level can differ. For instance, level 0 corresponds to a 40x magnification for WSIs with a maximum magnification of 40x, but it represents 20x magnification for WSIs with a maximum of 20x.
As an example, to generate 20x patches from WSIs with a 40x maximum magnification, we first extract 512×512 pixel patches at level 0. These patches are then resized to 256×256 pixels during the feature extraction phase. We subsequently follow CLAM's procedure to store patch coordinates and extracted features.
To train the model, please follow these steps using the provided train.sh script:
In the train.sh file, update the following variables to match your system's directory structure:
img_dir: Path to the directory containing patch images.pt_dir: Path to the directory containing extracted features.save_dir: Path to the directory where trained models and outputs will be saved.
Other necessary data path configurations reference data stored in the repository's data folder. Ensure these paths and filenames correctly reflect your local setup.
Specify your desired backbone model used for extracting features with CLAM by setting the appropriate backbone name and its dimensions within train.sh. We've included examples of backbones we've tested, but you're free to use any compatible backbone.
Review and adjust additional hyperparameters within the train.sh file as needed to align with your experimental goals. The hyperparameters in the current version of the train.sh file is the one we used for the experiments.
Finally, to start the training, you can do:
sh train.shYou can then evaluate the trained model on a dataset using:
sh eval.shThe list of variables in the eval.sh file should closely follow the list in the train.sh file.
This project is licensed under the MIT License - see the LICENSE file for details.
