Skip to content

Conversation

@dtm2451
Copy link
Collaborator

@dtm2451 dtm2451 commented Dec 29, 2025

In a recent DS Working Group meeting, we discussed the utility of adding some standardized functions for performing DGE with various tools. We also laid out a few helper functions -- pseudobulking, gene filtering -- that felt required across tools.

This PR will directly include the helper functions and I'd propose that we use this sc-dge-functions-branch as the base branch that we'll PR all of our tool-specific DGE function builds in to!

Planned functions:

  • pseudobulking
    • Initial implementation
      • Uses Seurat::AggregateExpression, grouping by a cell.by metadata and any number of sample.by metadata.
      • Adds a cell count to the metadata & allows trimming pseudobulks based on too few cells (10 by default)
      • Also pulls in additional metadata, as Seurat only grabs the 'group.by' metadata automatically.
      • Allows targeting only certain features or cell identities
      • Allows outputting as either Seurat or a list ("counts" matrix + "metadata" data.frame)
    • add dreamlet and scuttle implementations?
    • add normalization options?
  • python pseudobulking
    • Initial implementation
      • Uses scanpy.get.aggregate
      • Allows MuData (Muon) or AnnData (scanpy) objects.
      • Allows outputting as either AnnData or a dict ("counts" array + "feature_names" list + "metadata" DataFrame)
      • Very similar functionality to the R version, but definitely faster!
  • gene filtering
  • DGE_comparison_assessment: for parsing groups and check if sufficient samples to trust results.

@erflynn
Copy link

erflynn commented Jan 5, 2026

this looks awesome!
in case you want the dreamlet version:

 sce = SingleCellExperiment(list(counts=object@assays$RNA@counts), colData=object@meta.data)
 pb <- aggregateToPseudoBulk(sce,
                              assay = "counts",
                              cluster_id = cell.by,
                              sample_id = sample.by, 
                              verbose = FALSE)

@erflynn
Copy link

erflynn commented Jan 9, 2026

I think it would be worthwhile to include a pre-pseudobulk filter -- e.g. only pseudobulk a sample/cell type pair if there are at least X cells of that cell type in that sample
I'm looking at the dreamlet::ProcessAssays() again and it does this as well, will make a note on my wrapper.

And possibly downstream a corresponding DEG filter that only pulls a DEG comparison if there are a least N samples per group?

Comment on lines +126 to +132
too_small <- psobject@meta.data[,output.metadata.cell.count] < min.cells
if (too_small == ncol(psobject)) {
warning(paste0("Skipping triming pseudobulks smaller than 'min_cells' as NONE were built from more than ", min_cells, " cells."))
} else if (too_small > 0) {
msg_if("\tTrimming ", too_small, " pseudobulks built from fewer than ", min_cells, " cells.")
psobject <- psobject[,psobject@meta.data[,output.metadata.cell.count] >= min.cells]
}
Copy link
Collaborator Author

@dtm2451 dtm2451 Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile to include a pre-pseudobulk filter -- e.g. only pseudobulk a sample/cell type pair if there are at least X cells of that cell type in that sample

This is included already, here for the R function! It runs after the pseudobulking currently, but could move it to before instead if there's good reason.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh awesome! apologies, I should have looked more carefully. I just realized it is not in the dreamlet pseudobulk function, but then is implemented in the processAssays, so was kind of making a note to myself

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the downstream DEG filter to at least min.samples per category though is also useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed! got pulled away before posting that half =)

@dtm2451
Copy link
Collaborator Author

dtm2451 commented Jan 9, 2026

And possibly downstream a corresponding DEG filter that only pulls a DEG comparison if there are a least N samples per group?

Hmm Agreed. Perhaps a function that assesses the requested DGE comps per the dge.group.by and related vars, and contrast setups for however we end up setting that up to work instead... Adding a ToDo for this, 'dge_comparison_assement function' 👍, but I'm feeling there are extra bits to scope out before I'd start filling in this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants