Skip to content

Improve Xenium performance, fix multinucleate cells bug#376

Merged
LucaMarconato merged 3 commits intomainfrom
more-xenium-performance
Feb 25, 2026
Merged

Improve Xenium performance, fix multinucleate cells bug#376
LucaMarconato merged 3 commits intomainfrom
more-xenium-performance

Conversation

@LucaMarconato
Copy link
Member

@LucaMarconato LucaMarconato commented Feb 25, 2026

The last commit message is indicative of this PR. Copy-pasting it below.

Refactor xenium polygon reader to use zarr-based indices mapping
Replace slow parquet string-based cell_id grouping with zarr polygon_sets
for both nucleus and cell boundaries. This fixes a multinucleate cell bug
where multiple nuclei sharing the same cell_id were incorrectly merged
into a single polygon, and improves performance by avoiding expensive
string operations on large parquet columns.

Key changes:

  • Split _get_labels_and_indices_mapping into focused functions:
    _get_labels, _get_indices_mapping_from_zarr, _get_indices_mapping_legacy
  • Nucleus boundaries now use integer label_index as GeoDataFrame index
    (with cell_id as a column), correctly handling multinucleate cells
  • Cell boundaries keep string cell_id as GeoDataFrame index (legacy)
  • Read only needed parquet columns for faster I/O
  • Use integer label_id for fast change detection when available

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants