ProteoPy

ProteoPy is a Python library that brings quantitative proteomics into the AnnData ecosystem. It provides a unified and extensible framework for protein- and peptide-level analysis — from data import through quality control, preprocessing, and differential abundance testing — while storing all data and metadata in a single portable object.

Official documentation: proteopy.readthedocs.io

Why ProteoPy?

Mass spectrometry-based proteomics lacks a standardized data structure in python comparable to what AnnData provides for single-cell transcriptomics. Existing tools rely on distinct formats and scripting environments, forcing researchers to learn multiple ecosystems and making multi-omics integration cumbersome. ProteoPy bridges this gap by adopting the proven AnnData framework, enabling:

Familiar workflows for users of scanpy, squidpy, and the broader single-cell python ecosystem
Reproducible analyses with all processing steps tracked in a single object
Seamless multi-omics integration via direct compatibility with MuData and MUON
Direct scanpy integration for dimensionality reduction, clustering and visualization as well as single-cell analysis compatibility.

Key Features

Flexible data import from DIA-NN, MaxQuant, and generic tabular formats
Quality control & filtering with completeness metrics, CV analysis, and contaminant removal
Preprocessing including normalization, batch correction (via scanpy), and missing-value imputation
Peptide-level analysis with overlapping peptide grouping, peptide-to- protein quantification, and per-protein peptide intensity visualization
Differential abundance analysis with t-test, Welch's test and multiple testing correction
Proteoform inference via a reimplementation of the COPF algorithm for detecting functional proteoform groups from peptide-level data
Publication-ready visualizations for QC, exploratory analysis, and statistical results

Installation

ProteoPy requires Python 3.10 or later. We recommend installing ProteoPy in a dedicated virtual environment:

# Using venv
python -m venv proteopy-env
source proteopy-env/bin/activate  # Linux/macOS
# proteopy-env\Scripts\activate   # Windows
pip install ipykernel
python -m ipykernel install --user --name=proteopy-env

# Using conda
conda create -n proteopy-env "python>=3.10"
conda activate proteopy-env
pip install ipykernel
python -m ipykernel install --user --name=proteopy-env

# Using uv
uv venv proteopy-env
source proteopy-env/bin/activate
uv pip install ipykernel
python -m ipykernel install --user --name=proteopy-env

Then install ProteoPy:

pip install proteopy

For notebook-centric workflows, the [usage] extra installs ipykernel, jupyterlab, and scanpy (for extended analysis functionality such as batch control, PCA, UMAP and more):

pip install proteopy[usage]

To install the development version from GitHub:

pip install git+https://github.com/UKHD-NP/proteopy.git

Documentation

Full documentation, including API reference and tutorials, is available at proteopy.readthedocs.io.

Tutorials

Protein-level analysis — Complete workflow from data import to differential abundance analysis (notebook)
Proteoform inference — Detecting functional proteoform groups with COPF (notebook)

Quick Start

import proteopy as pr
import scanpy as sc

# Load example dataset
adata = pr.datasets.karayel_2020()

# Quality control: filter by completeness
pr.pp.filter_var_completeness(adata, min_fraction=0.8, zero_to_na=True)

# Preprocessing
pr.pp.normalize_median(adata, log_space=True)
pr.pp.impute_downshift(adata, downshift=1.8, width=0.3)

# Differential abundance analysis
pr.tl.differential_abundance(adata, method="ttest_two_sample", group_by="cell_type")

# Visualize results
pr.pl.volcano_plot(adata, varm_slot="ttest_two_sample;cell_type;Ortho_vs_rest")

# Seamless scanpy integration for dimensionality reduction
sc.tl.pca(adata)
sc.pl.pca(adata, color="cell_type")

Support

Questions & discussions: GitHub Discussions
Bug reports & feature requests: GitHub Issues
Maintainer: @idf-io (Ian Dirk Fichtner)
See HISTORY.md for the changelog.

Citing ProteoPy

If you use ProteoPy in your research, please cite:

Fichtner ID, Sahm F, Gerstung M, Bludau I. ProteoPy: an AnnData-based framework for integrated proteomics analysis. UNPUBLISHED (2025).

@article{fichtner2025proteopy,
  title={ProteoPy: an AnnData-based framework for integrated proteomics analysis},
  author={Fichtner, Ian Dirk and Sahm, Felix and Gerstung, Moritz and Bludau, Isabell},
  journal={UNPUBLISHED},
  year={2025}
}

If you use the COPF proteoform inference functionality, please also cite:

Bludau I, et al. Systematic detection of functional proteoform groups from bottom-up proteomic datasets. Nat. Commun. 12, 3810 (2021). doi:10.1038/s41467-021-24030-x

License

ProteoPy was developed by the Bludau Lab at the Neuropathology Department Heidelberg and is freely available under the Apache 2.0 license. External Python dependencies (see pyproject.toml file) have their own licenses, which can be consulted on their respective websites.

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github		.github
data		data
docs		docs
proteopy		proteopy
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
AGENTS.md		AGENTS.md
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProteoPy

Why ProteoPy?

Key Features

Installation

Documentation

Tutorials

Quick Start

Support

Citing ProteoPy

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

UKHD-NP/proteopy

Folders and files

Latest commit

History

Repository files navigation

ProteoPy

Why ProteoPy?

Key Features

Installation

Documentation

Tutorials

Quick Start

Support

Citing ProteoPy

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages