Establishes some general functions relating to single-cell DGE #18

dtm2451 · 2025-12-29T22:55:12Z

In a recent DS Working Group meeting, we discussed the utility of adding some standardized functions for performing DGE with various tools. We also laid out a few helper functions -- pseudobulking, gene filtering -- that felt required across tools.

This PR will directly include the helper functions and I'd propose that we use this sc-dge-functions-branch as the base branch that we'll PR all of our tool-specific DGE function builds in to!

Planned functions:

pseudobulking
- Initial implementation
  - Uses Seurat::AggregateExpression, grouping by a cell.by metadata and any number of sample.by metadata.
  - Adds a cell count to the metadata & allows trimming pseudobulks based on too few cells (10 by default)
  - Also pulls in additional metadata, as Seurat only grabs the 'group.by' metadata automatically.
  - Allows targeting only certain features or cell identities
  - Allows outputting as either Seurat or a list ("counts" matrix + "metadata" data.frame)
- add dreamlet and scuttle implementations?
- add normalization options?
python pseudobulking
- Initial implementation
  - Uses scanpy.get.aggregate
  - Allows MuData (Muon) or AnnData (scanpy) objects.
  - Allows outputting as either AnnData or a dict ("counts" array + "feature_names" list + "metadata" DataFrame)
  - Very similar functionality to the R version, but definitely faster!
gene filtering
DGE_comparison_assessment: for parsing groups and check if sufficient samples to trust results.

… function

…im if would leave none, py-only catch if no metadata would be added, py-only catch and remove fake pseudobulks created by scanpy

erflynn · 2026-01-05T19:16:49Z

this looks awesome!
in case you want the dreamlet version:

 sce = SingleCellExperiment(list(counts=object@assays$RNA@counts), colData=object@meta.data)
 pb <- aggregateToPseudoBulk(sce,
                              assay = "counts",
                              cluster_id = cell.by,
                              sample_id = sample.by, 
                              verbose = FALSE)

erflynn · 2026-01-09T18:25:59Z

I think it would be worthwhile to include a pre-pseudobulk filter -- e.g. only pseudobulk a sample/cell type pair if there are at least X cells of that cell type in that sample
I'm looking at the dreamlet::ProcessAssays() again and it does this as well, will make a note on my wrapper.

And possibly downstream a corresponding DEG filter that only pulls a DEG comparison if there are a least N samples per group?

dtm2451 · 2026-01-09T19:45:20Z

single_cell/pseudobulk_function.R

+        too_small <- psobject@meta.data[,output.metadata.cell.count] < min.cells
+        if (too_small == ncol(psobject)) {
+            warning(paste0("Skipping triming pseudobulks smaller than 'min_cells' as NONE were built from more than ", min_cells, " cells."))
+        } else if (too_small > 0) {
+            msg_if("\tTrimming ", too_small, " pseudobulks built from fewer than ", min_cells, " cells.")
+            psobject <- psobject[,psobject@meta.data[,output.metadata.cell.count] >= min.cells]
+        }


I think it would be worthwhile to include a pre-pseudobulk filter -- e.g. only pseudobulk a sample/cell type pair if there are at least X cells of that cell type in that sample

This is included already, here for the R function! It runs after the pseudobulking currently, but could move it to before instead if there's good reason.

oh awesome! apologies, I should have looked more carefully. I just realized it is not in the dreamlet pseudobulk function, but then is implemented in the processAssays, so was kind of making a note to myself

I do think the downstream DEG filter to at least min.samples per category though is also useful

agreed! got pulled away before posting that half =)

dtm2451 · 2026-01-09T21:34:06Z

And possibly downstream a corresponding DEG filter that only pulls a DEG comparison if there are a least N samples per group?

Hmm Agreed. Perhaps a function that assesses the requested DGE comps per the dge.group.by and related vars, and contrast setups for however we end up setting that up to work instead... Adding a ToDo for this, 'dge_comparison_assement function' 👍, but I'm feeling there are extra bits to scope out before I'd start filling in this one.

This reverts commit 13078a4.

dtm2451 added 7 commits December 29, 2025 17:15

initialize 'dsco_pseudobulk' function

2402410

add missed 'Seurat::' callout

af57a19

actually remove ts_log need when 'verbose = FALSE'

f1281b2

initialize python pseudobulk function, slight parity alignments for R…

ec90754

… function

python pseudobulk fxn docs update

9d0aff7

pseudobulk functions, multiple: messaging tweaks, skip 'min.cells' tr…

b4ef070

…im if would leave none, py-only catch if no metadata would be added, py-only catch and remove fake pseudobulks created by scanpy

pseudobulk functions: fix 'min.cells' checks

1dfe483

dtm2451 commented Jan 9, 2026

View reviewed changes

dtm2451 and others added 3 commits January 16, 2026 16:02

stub file to create 'single_cell/dge' folder

1934e13

initial commit of deseq2 function

13078a4

Revert "initial commit of deseq2 function"

290fd95

This reverts commit 13078a4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Establishes some general functions relating to single-cell DGE #18

Establishes some general functions relating to single-cell DGE #18

Uh oh!

dtm2451 commented Dec 29, 2025 •

edited

Loading

Uh oh!

erflynn commented Jan 5, 2026

Uh oh!

erflynn commented Jan 9, 2026 •

edited

Loading

Uh oh!

dtm2451 Jan 9, 2026 •

edited

Loading

Uh oh!

erflynn Jan 9, 2026

Uh oh!

erflynn Jan 9, 2026

Uh oh!

dtm2451 Jan 9, 2026

Uh oh!

dtm2451 commented Jan 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Establishes some general functions relating to single-cell DGE #18

Are you sure you want to change the base?

Establishes some general functions relating to single-cell DGE #18

Uh oh!

Conversation

dtm2451 commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erflynn commented Jan 5, 2026

Uh oh!

erflynn commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtm2451 Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erflynn Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

erflynn Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dtm2451 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dtm2451 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dtm2451 commented Dec 29, 2025 •

edited

Loading

erflynn commented Jan 9, 2026 •

edited

Loading

dtm2451 Jan 9, 2026 •

edited

Loading

dtm2451 commented Jan 9, 2026 •

edited

Loading