Skip to content

[upgrade] Baysor: migrate Julia v0.7.1 → C++ cpp-0.8.2 (eliminates PARQUET_TO_CSV for Baysor paths) #164

@an-altosian

Description

@an-altosian

Current state

  • Pinned: Baysor v0.7.1 (Julia) at modules/local/baysor/run/main.nf:5
  • Source: https://github.com/kharchenkolab/Baysor master branch
  • Distribution: Julia binary, containerised via community.wave.seqera.io
  • Modules: baysor/{preprocess, create_dataset, run, preview, segfree}
  • Pipeline modes: image (cellpose → baysor), coordinate (proseg/segger → baysor), preview, segfree

Proposed upgrade

  • Target: Baysor cpp-0.8.2 (C++ port) — released 2026-04-30
  • Source: https://github.com/kharchenkolab/Baysor cpp branch — https://github.com/kharchenkolab/Baysor/releases/tag/cpp-0.8.2
  • Distribution: Native C++ binary via CMake, or Docker (image TBD)
  • Project description: "a faithful C++ implementation of the current Baysor algorithm with a more efficient runtime and broader modern I/O support"
  • Release notes for 0.8.2: "improves cross-platform builds, adds some optimizations for large runs (e.g. Xenium 5K)"

Why upgrade

  1. Direct parquet support — the C++ port natively reads parquet (no Julia Parquet.jl Zstd issue). This would let us delete the PARQUET_TO_CSV step in BAYSOR_GENERATE_PREVIEW and BAYSOR_RUN_TRANSCRIPTS_PARQUET (tiled). That step OOMs on the Atera (Xenium WTA 18k-target) at default memory — see the Atera compatibility report (docs/2026-05-28_REVIEW_atera-on-spatialaxe-compatibility.md).
  2. Performance — C++ rewrite is "much more efficient" per project description. Optimizations explicitly target Xenium 5K-panel scale, which is directly relevant to Atera ~18k-target workloads.
  3. Direct experiment.xenium input — could simplify bundle-staging logic for image- and coordinate-mode subworkflows.
  4. Algorithmic continuity — cpp port preserves the v0.7.1 algorithm, so segmentation behavior should match.

Migration plan

  • Identify or build official container for cpp-0.8.2 (binary build vs Docker)
  • Update containers in baysor/run, baysor/preview, baysor/segfree, baysor/preprocess, baysor/create_dataset
  • Audit CLI argument compatibility — cpp may differ from Julia (e.g., column flag names, --scale)
  • Drop or refactor PARQUET_TO_CSV calls in baysor_generate_preview and baysor_run_transcripts_parquet subworkflows
  • If experiment.xenium direct input works, simplify BAYSOR_PREPROCESS_TRANSCRIPTS accordingly
  • End-to-end tests on a Xenium v1 bundle (XOA 4.x) and Atera Cell Pellet

Risks

  • CLI compatibility: needs per-command verification
  • Output format compatibility: cpp may emit different polygon / transcript-assignment file structures; downstream XR import needs validation
  • Container ecosystem: no Wave container yet known — may need a request to Seqera or a custom build

Cross-links

  • Related (paired): Punkst / Ficture C++ migration — together these eliminate PARQUET_TO_CSV entirely from the pipeline.
  • Triggered by: Atera compatibility session 2026-05-28.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions