pepdiff

Differential abundance analysis for phosphoproteomics data.

Purpose

Identify peptides with significant abundance changes between experimental conditions. Answers: "What's differentially abundant?"

Project Status

v1.0.0 released (Feb 2026). The package is feature-complete with:

S3 classes: pepdiff_data and pepdiff_results
Three analysis methods: GLM (Gamma + emmeans), ART, pairwise
Four pairwise tests: wilcoxon, bootstrap_t, bayes_t, rankprod
GLM fit diagnostics with plot_fit_diagnostics()
Plot methods for both classes
Six vignettes covering all workflows
pkgdown documentation site: https://teammaclean.github.io/pepdiff/
devtools::check() passes with 0 errors/warnings

Companion to peppwR:

peppwR: "How many samples do I need?" (power analysis, planning)
pepdiff: "What's differentially abundant?" (analysis, results)

Core Workflow

CSV → read_pepdiff() → pepdiff_data → compare() → pepdiff_results → plots

Documentation

Getting Started: vignettes/basic_workflow.Rmd
GLM Analysis: vignettes/glm_analysis.Rmd
ART Analysis: vignettes/art_analysis.Rmd
Pairwise Tests: vignettes/pairwise_tests.Rmd
Checking Model Fit: vignettes/checking_fit.Rmd
Diagnostic Plots: vignettes/diagnostic_plots.Rmd

Online: https://teammaclean.github.io/pepdiff/

Design Constraints

Cross-Sectional Factorial Designs Only

pepdiff assumes independent biological replicates in each cell of the experimental design. NOT for longitudinal/repeated-measures studies where the same biological unit is tracked over time.

No random effects needed
Simple GLM is appropriate
Technical replicates must be combined explicitly before analysis

Flexible Factors

Factor names are user-defined at import, not hardcoded. Common setups:

treatment × timepoint
genotype × treatment
condition × timepoint × tissue

S3 Classes

`pepdiff_data`

Import/preprocessed data from read_pepdiff().

Slots:

data - tibble: peptide, gene_id, [factors...], bio_rep, value
factors - character: names of factor columns
design - tibble: unique factor combinations with n_reps, n_peptides
missingness - tibble: peptide, na_rate, mnar_score, mean_abundance
peptides - character vector
call - original function call

Methods:

print() - Summary of structure, missingness
summary() - Detailed breakdown by group
plot() - Multi-panel: PCA + distributions + missingness

`pepdiff_results`

Analysis results from compare(). Results in long format (tidy).

Slots:

results - tibble (long): peptide, gene_id, comparison, [factor levels], fold_change, log2_fc, test, p_value, fdr, significant
comparisons - tibble: comparison definitions
method - "glm", "art", or "pairwise"
diagnostics - tibble (nested): peptide, converged, deviance, residuals, std_residuals, fitted
params - list: alpha, fdr_method, formula, etc.
data - the pepdiff_data object used
call - original function call

Methods:

print() - Summary: n comparisons, n significant
summary() - Per-comparison breakdown
plot() - Multi-panel: volcano + p-value histogram + FC distribution

Core Functions

Import

read_pepdiff(file, id, gene, value, factors, replicate, tech_rep = NULL)
# Returns pepdiff_data

combine_tech_reps(data, fun = mean)
# Explicit step to average technical replicates

Analysis

Three methods:

Method	Model	Use Case
`"glm"` (default)	Gamma GLM + emmeans	Most proteomics data
`"art"`	Aligned Rank Transform	Non-parametric alternative
`"pairwise"`	Direct two-group tests	Simple comparisons

Simple interface:

compare(data,
        compare = "treatment",       # factor to contrast
        ref = "ctrl",                # reference level
        within = "timepoint",        # stratify by (optional)
        method = "glm")

Formula interface (power users):

compare(data,
        contrast = pairwise ~ treatment | genotype + timepoint,
        method = "glm")

Pairwise tests (matching peppwR):

"wilcoxon" - Wilcoxon rank-sum
"bootstrap_t" - Bootstrap t-test
"bayes_t" - Bayes factor t-test
"rankprod" - Rank products

Diagnostics

plot_fit_diagnostics(results)
# Returns 4-panel diagnostic plot for GLM model fit assessment

Plots

Class methods:

plot(pepdiff_data) - PCA + distributions + missingness
plot(pepdiff_results) - volcano + p-value hist + FC distribution

Individual functions:

plot_pca(), plot_distributions(), plot_missingness()
plot_volcano(), plot_heatmap(), plot_pvalue_hist(), plot_fc_distribution()
plot_fit_diagnostics() - GLM model fit assessment

File Structure

R/
  data.R            # read_pepdiff(), combine_tech_reps(), pepdiff_data class
  compare.R         # compare() generic and methods
  models.R          # GLM fitting, ART fitting, contrast extraction
  tests.R           # Pairwise statistical tests (wilcoxon, bootstrap_t, etc.)
  results.R         # pepdiff_results class, print/summary methods
  plots.R           # All plot functions and plot methods
  diagnostics.R     # plot_fit_diagnostics() and helpers
  utils.R           # Helpers, validation, internal utilities
  legacy.R          # Deprecated compare.data.frame method
  legacy-pepdiff.R  # Original v1 functions (preserved for compatibility)

tests/testthat/
  helper-fixtures.R # Synthetic test data generators
  test-data.R       # Import and preprocessing tests
  test-compare.R    # compare() function tests
  test-models.R     # Model fitting tests
  test-tests.R      # Statistical test implementations
  test-results.R    # Results class tests
  test-plots.R      # Plot output tests
  test-diagnostics.R # Diagnostics function tests
  test-legacy.R     # Backwards compatibility tests

vignettes/
  basic_workflow.Rmd   # Getting started guide
  glm_analysis.Rmd     # GLM method deep dive
  art_analysis.Rmd     # ART method guide
  pairwise_tests.Rmd   # Pairwise comparison methods
  checking_fit.Rmd     # GLM diagnostics guide
  diagnostic_plots.Rmd # Visualization options

Error Handling

Model Convergence Failures

If GLM/ART doesn't converge for a peptide:

Peptide excluded from results
Tracked in diagnostics slot
Warning in print(): "X peptides excluded (model did not converge)"

Philosophy: Fail is fail. User needs to know they may need a different design.

FDR Correction

Benjamini-Hochberg applied within each comparison, not globally across all comparisons.

Development Guidelines

Code Style

Tidyverse style (pipes, dplyr verbs)
S3 classes for core objects with print/plot methods
Document with roxygen2
Explicit namespace calls for non-base functions (dplyr::filter())
Keep test implementations in sync with peppwR (wilcoxon, bootstrap_t, bayes_t, rankprod)

Package Checks

# Fast check (skip vignettes)
devtools::check(vignettes = FALSE)

# Run tests only
devtools::test()

# Full check
devtools::check()

Dependencies

Imports

dplyr, tidyr, tibble, rlang, magrittr - data manipulation
readr - CSV import
ggplot2, cowplot - core plotting
emmeans - GLM contrast extraction
ARTool - ART method
stringr, forcats - string/factor utilities

Suggests

ComplexHeatmap - heatmaps (Bioconductor)
RankProd - rank products (Bioconductor)
UpSetR - upset plots
MKinfer - bootstrap tests
testthat - testing
knitr, rmarkdown - vignettes

Note: bayes_t uses a native JZS approximation (no BayesFactor dependency)

Development Workflow for Future Changes

See semi-autonomous-feature-development.md for detailed workflow.

Discuss → TDD → Ralph Loop

Discuss - Human describes feature/bug, Claude asks clarifying questions, agree on scope
TDD - Write failing test first (the test IS the spec)
Ralph Loop - Claude iterates autonomously until tests pass

Key Principles

Tests are the contract - No ambiguity about completion
No implementation until test fails - Red → Green → Refactor
Clear context before implementation - Commit test, start fresh session
Self-contained prompts - Reference files, not discussion history

For Bug Fixes / Features

1. Discuss requirements
2. Write failing test in tests/testthat/
3. Commit the test
4. /clear or new session
5. /ralph-loop with verification command: devtools::test(filter = "test-name")
6. Human smoke test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pepdiff

Purpose

Project Status

Core Workflow

Documentation

Design Constraints

Cross-Sectional Factorial Designs Only

Flexible Factors

S3 Classes

`pepdiff_data`

`pepdiff_results`

Core Functions

Import

Analysis

Diagnostics

Plots

File Structure

Error Handling

Model Convergence Failures

FDR Correction

Development Guidelines

Code Style

Package Checks

Dependencies

Imports

Suggests

Development Workflow for Future Changes

Discuss → TDD → Ralph Loop

Key Principles

For Bug Fixes / Features

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

pepdiff

Purpose

Project Status

Core Workflow

Documentation

Design Constraints

Cross-Sectional Factorial Designs Only

Flexible Factors

S3 Classes

pepdiff_data

pepdiff_results

Core Functions

Import

Analysis

Diagnostics

Plots

File Structure

Error Handling

Model Convergence Failures

FDR Correction

Development Guidelines

Code Style

Package Checks

Dependencies

Imports

Suggests

Development Workflow for Future Changes

Discuss → TDD → Ralph Loop

Key Principles

For Bug Fixes / Features

`pepdiff_data`

`pepdiff_results`