Skip to content

[upgrade] Ficture: migrate seqscope/ficture (Python) → Yichen-Si/punkst (C++) #165

@an-altosian

Description

@an-altosian

Current state

  • Pinned: Python ficture package via Wave container community.wave.seqera.io/library/pip_ficture:ad8a1265a51b53cf
  • Source: https://github.com/seqscope/ficture
  • Distribution: pip install, containerised
  • Modules: ficture/preprocess, ficture/model
  • Pipeline modes: segfree (when --method ficture)

Proposed upgrade

  • Target: Punkst — C++ rewrite of the FICTURE algorithm by Yichen Si (FICTURE author).
  • Distribution: Docker image philo1984/punkst:latest (project notes "the Docker image is not always up to date"), or CMake from source.
  • Project state: 99.4% C++, 164 commits at audit time. No formal release tags yet.
  • Inputs: project mentions "http(s) / s3:// input support". Direct parquet support TBD — needs investigation.

Why upgrade

  1. C++ rewrite — "substantially more efficient and (hopefully) easier to use" per project description. Matches our Atera scale concerns.
  2. Potentially eliminates PARQUET_TO_CSV — if Punkst accepts parquet directly, we can drop the conversion step in FICTURE_PREPROCESS_MODEL. (TBD per investigation.)
  3. Smaller container surface — Python ficture pulls a heavy dependency tree; C++ static binary is much leaner.
  4. Active maintenance — Punkst is where FICTURE development is moving.

Migration plan

  • Confirm Punkst accepts parquet (or what input formats it supports) — open question to the upstream
  • Map ficture_preprocess.py CLI to Punkst CLI (negative_control_regex, transcripts path, features, min_phred_score)
  • Pin a specific Punkst commit hash — there are no release tags yet (this is risky; should also open an upstream issue requesting tagged releases)
  • Update ficture/preprocess module container + invocation
  • Drop PARQUET_TO_CSV in FICTURE_PREPROCESS_MODEL subworkflow (if Punkst takes parquet)
  • Validate algorithmic equivalence — Punkst output should match Python ficture for the same input

Risks

  • No formal release — pinning a commit hash is brittle for a public nf-core pipeline. Request tagged releases upstream first.
  • Container freshnessphilo1984/punkst:latest "is not always up to date"
  • CLI / output schema may differ — downstream FICTURE process consumes preprocess outputs
  • Documentation gap — Punkst's docs are sparser than ficture's

Cross-links

  • Related (paired): Baysor C++ migration — together these eliminate PARQUET_TO_CSV entirely.
  • Triggered by: Atera compatibility session 2026-05-28 (user note: "ficture has a new version called Punkst which support parquet").

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions