Skip to content

TPbiocode/geneformer_benchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geneformer Benchmarking

This repository contains reproducible benchmarking scripts for Geneformer-based single-cell RNA-seq cell type classification, including donor-level splits, tokenization, model training, and evaluation for comparing Geneformer models (V1-10M, V2-104M, V2-316M).

Overview

This repository contains:

  • Configurable SLURM training scripts for HPC environments
  • Jupyter notebooks for data preprocessing and results analysis
  • Utilities for fair model comparison with proportional layer freezing

Requirements

  • Geneformer package from HuggingFace
  • Python 3.11+
  • See requirements.txt for dependencies

Usage

See scripts/slurm/README.md for detailed instructions.

About

Geneformer benchmarking for single-cell cell type classification from preprocessing to evaluation metrics and confusion-matrix analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages