Skip to content

Make the v3 topic modeling workflow backend-agnostic and add tomotopy support#226

Open
sldyns wants to merge 1 commit intoaertslab:pycistopic_v3from
sldyns:pycistopic_v3
Open

Make the v3 topic modeling workflow backend-agnostic and add tomotopy support#226
sldyns wants to merge 1 commit intoaertslab:pycistopic_v3from
sldyns:pycistopic_v3

Conversation

@sldyns
Copy link
Copy Markdown

@sldyns sldyns commented Apr 10, 2026

Summary

This PR refactors the current pycistopic_v3 topic-modeling workflow so that v3 artifacts are backend-agnostic instead of being tied to Mallet-specific readers and filenames, and adds a tomotopy LDA backend that writes the same artifact bundle.

What changed

  • add a shared topic-model abstraction in topic_models.py
  • add TopicModelFilenames to centralize the v3 artifact layout
  • add backend resolution from the saved parameters file via load_topic_model_backend
  • add LDATomotopy to train directly from the binary accessibility matrix and write the standard v3 outputs
  • refactor LDAMallet to implement the same backend interface and emit the same artifact bundle
  • update the topic_modeling CLI to use a shared run entry point with --backend {mallet,tomotopy}
  • keep corpus creation as a Mallet-specific step, while allowing tomotopy to run directly from the matrix plus barcode/region inputs
  • update create_anndata, model-stat calculation, plotting, and topic binarization to load outputs through the backend abstraction instead of assuming Mallet-only artifacts
  • update log-likelihood/stat handling so backend-specific hyperparameters are interpreted correctly, including learned alpha values from tomotopy
  • add tomotopy>=0.14.0 as a dependency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant