Differential abundance analysis for phosphoproteomics data.
Identify peptides with significant abundance changes between experimental conditions. Answers: "What's differentially abundant?"
v1.0.0 released (Feb 2026). The package is feature-complete with:
- S3 classes:
pepdiff_dataandpepdiff_results - Three analysis methods: GLM (Gamma + emmeans), ART, pairwise
- Four pairwise tests: wilcoxon, bootstrap_t, bayes_t, rankprod
- GLM fit diagnostics with
plot_fit_diagnostics() - Plot methods for both classes
- Six vignettes covering all workflows
- pkgdown documentation site: https://teammaclean.github.io/pepdiff/
devtools::check()passes with 0 errors/warnings
Companion to peppwR:
- peppwR: "How many samples do I need?" (power analysis, planning)
- pepdiff: "What's differentially abundant?" (analysis, results)
CSV → read_pepdiff() → pepdiff_data → compare() → pepdiff_results → plots
- Getting Started:
vignettes/basic_workflow.Rmd - GLM Analysis:
vignettes/glm_analysis.Rmd - ART Analysis:
vignettes/art_analysis.Rmd - Pairwise Tests:
vignettes/pairwise_tests.Rmd - Checking Model Fit:
vignettes/checking_fit.Rmd - Diagnostic Plots:
vignettes/diagnostic_plots.Rmd
Online: https://teammaclean.github.io/pepdiff/
pepdiff assumes independent biological replicates in each cell of the experimental design. NOT for longitudinal/repeated-measures studies where the same biological unit is tracked over time.
- No random effects needed
- Simple GLM is appropriate
- Technical replicates must be combined explicitly before analysis
Factor names are user-defined at import, not hardcoded. Common setups:
- treatment × timepoint
- genotype × treatment
- condition × timepoint × tissue
Import/preprocessed data from read_pepdiff().
Slots:
data- tibble: peptide, gene_id, [factors...], bio_rep, valuefactors- character: names of factor columnsdesign- tibble: unique factor combinations with n_reps, n_peptidesmissingness- tibble: peptide, na_rate, mnar_score, mean_abundancepeptides- character vectorcall- original function call
Methods:
print()- Summary of structure, missingnesssummary()- Detailed breakdown by groupplot()- Multi-panel: PCA + distributions + missingness
Analysis results from compare(). Results in long format (tidy).
Slots:
results- tibble (long): peptide, gene_id, comparison, [factor levels], fold_change, log2_fc, test, p_value, fdr, significantcomparisons- tibble: comparison definitionsmethod- "glm", "art", or "pairwise"diagnostics- tibble (nested): peptide, converged, deviance, residuals, std_residuals, fittedparams- list: alpha, fdr_method, formula, etc.data- the pepdiff_data object usedcall- original function call
Methods:
print()- Summary: n comparisons, n significantsummary()- Per-comparison breakdownplot()- Multi-panel: volcano + p-value histogram + FC distribution
read_pepdiff(file, id, gene, value, factors, replicate, tech_rep = NULL)
# Returns pepdiff_data
combine_tech_reps(data, fun = mean)
# Explicit step to average technical replicatesThree methods:
| Method | Model | Use Case |
|---|---|---|
"glm" (default) |
Gamma GLM + emmeans | Most proteomics data |
"art" |
Aligned Rank Transform | Non-parametric alternative |
"pairwise" |
Direct two-group tests | Simple comparisons |
Simple interface:
compare(data,
compare = "treatment", # factor to contrast
ref = "ctrl", # reference level
within = "timepoint", # stratify by (optional)
method = "glm")Formula interface (power users):
compare(data,
contrast = pairwise ~ treatment | genotype + timepoint,
method = "glm")Pairwise tests (matching peppwR):
"wilcoxon"- Wilcoxon rank-sum"bootstrap_t"- Bootstrap t-test"bayes_t"- Bayes factor t-test"rankprod"- Rank products
plot_fit_diagnostics(results)
# Returns 4-panel diagnostic plot for GLM model fit assessmentClass methods:
plot(pepdiff_data)- PCA + distributions + missingnessplot(pepdiff_results)- volcano + p-value hist + FC distribution
Individual functions:
plot_pca(),plot_distributions(),plot_missingness()plot_volcano(),plot_heatmap(),plot_pvalue_hist(),plot_fc_distribution()plot_fit_diagnostics()- GLM model fit assessment
R/
data.R # read_pepdiff(), combine_tech_reps(), pepdiff_data class
compare.R # compare() generic and methods
models.R # GLM fitting, ART fitting, contrast extraction
tests.R # Pairwise statistical tests (wilcoxon, bootstrap_t, etc.)
results.R # pepdiff_results class, print/summary methods
plots.R # All plot functions and plot methods
diagnostics.R # plot_fit_diagnostics() and helpers
utils.R # Helpers, validation, internal utilities
legacy.R # Deprecated compare.data.frame method
legacy-pepdiff.R # Original v1 functions (preserved for compatibility)
tests/testthat/
helper-fixtures.R # Synthetic test data generators
test-data.R # Import and preprocessing tests
test-compare.R # compare() function tests
test-models.R # Model fitting tests
test-tests.R # Statistical test implementations
test-results.R # Results class tests
test-plots.R # Plot output tests
test-diagnostics.R # Diagnostics function tests
test-legacy.R # Backwards compatibility tests
vignettes/
basic_workflow.Rmd # Getting started guide
glm_analysis.Rmd # GLM method deep dive
art_analysis.Rmd # ART method guide
pairwise_tests.Rmd # Pairwise comparison methods
checking_fit.Rmd # GLM diagnostics guide
diagnostic_plots.Rmd # Visualization options
If GLM/ART doesn't converge for a peptide:
- Peptide excluded from results
- Tracked in
diagnosticsslot - Warning in
print(): "X peptides excluded (model did not converge)"
Philosophy: Fail is fail. User needs to know they may need a different design.
Benjamini-Hochberg applied within each comparison, not globally across all comparisons.
- Tidyverse style (pipes, dplyr verbs)
- S3 classes for core objects with print/plot methods
- Document with roxygen2
- Explicit namespace calls for non-base functions (
dplyr::filter()) - Keep test implementations in sync with peppwR (wilcoxon, bootstrap_t, bayes_t, rankprod)
# Fast check (skip vignettes)
devtools::check(vignettes = FALSE)
# Run tests only
devtools::test()
# Full check
devtools::check()- dplyr, tidyr, tibble, rlang, magrittr - data manipulation
- readr - CSV import
- ggplot2, cowplot - core plotting
- emmeans - GLM contrast extraction
- ARTool - ART method
- stringr, forcats - string/factor utilities
- ComplexHeatmap - heatmaps (Bioconductor)
- RankProd - rank products (Bioconductor)
- UpSetR - upset plots
- MKinfer - bootstrap tests
- testthat - testing
- knitr, rmarkdown - vignettes
Note: bayes_t uses a native JZS approximation (no BayesFactor dependency)
See semi-autonomous-feature-development.md for detailed workflow.
- Discuss - Human describes feature/bug, Claude asks clarifying questions, agree on scope
- TDD - Write failing test first (the test IS the spec)
- Ralph Loop - Claude iterates autonomously until tests pass
- Tests are the contract - No ambiguity about completion
- No implementation until test fails - Red → Green → Refactor
- Clear context before implementation - Commit test, start fresh session
- Self-contained prompts - Reference files, not discussion history
1. Discuss requirements
2. Write failing test in tests/testthat/
3. Commit the test
4. /clear or new session
5. /ralph-loop with verification command: devtools::test(filter = "test-name")
6. Human smoke test