Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 19, 2025

Batch writes to CellMapDatasetWriter failed with shape mismatches because the iteration logic passed entire batch arrays to each index instead of extracting individual items.

Changes

  • Batch iteration: Extract per-item data using batch_idx when iterating over batch indices
  • Channel indexing: Fix array[:, c, ...]array[c, ...] to correctly extract class channels from (classes, ...spatial) format
  • Metadata filtering: Skip special keys (e.g., "idx") that shouldn't be written to disk

Example

# Before: Failed with "Data shape (32, 1, 256, 256) does not match expected shape (1, 256, 256)"
batch_indices = torch.tensor([0, 1, 2, ..., 31])  # 32 indices
predictions = torch.randn(32, 2, 256, 256)  # batch_size=32, classes=2
writer[batch_indices] = {"pred": predictions}

# After: Works correctly - extracts predictions[i] for each index i

Testing

Added 9 batch operation tests covering tensor/numpy/list indices, large batches, dict arrays, and 2D/3D data. All 260 existing tests pass.

Original prompt

This section details on the original issue you should resolve

<issue_title>Dataset writer issue</issue_title>
<issue_description>Predicting orthogonal planes.
Temporary directory for predictions: /tmp/tmpvvmr6xzn
Batch SIze 32
0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/spuser/ws/latest2/cellmap-segmentation-challenge/examples/predict_2D.py", line 9, in
predict(config_path, crops="19", overwrite=True)
File "/home/spuser/ws/latest2/cellmap-segmentation-challenge/src/cellmap_segmentation_challenge/predict.py", line 322, in predict
predict_func(model, dataset_writer, batch_size)
File "/home/spuser/ws/latest2/cellmap-segmentation-challenge/src/cellmap_segmentation_challenge/predict.py", line 46, in predict_orthoplanes
_predict(
File "/home/spuser/ws/latest2/cellmap-segmentation-challenge/src/cellmap_segmentation_challenge/predict.py", line 150, in _predict
dataset_writer[batch["idx"]] = outputs

File "/data/Chromatin/miniconda3/envs/csc/lib/python3.11/site-packages/cellmap_data/dataset_writer.py", line 356, in setitem
self.setitem(i, arrays)
File "/data/Chromatin/miniconda3/envs/csc/lib/python3.11/site-packages/cellmap_data/dataset_writer.py", line 374, in setitem
self.target_array_writers[array_name][label][
File "/data/Chromatin/miniconda3/envs/csc/lib/python3.11/site-packages/cellmap_data/image_writer.py", line 274, in setitem
self._write_single_item(coords, data) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/Chromatin/miniconda3/envs/csc/lib/python3.11/site-packages/cellmap_data/image_writer.py", line 300, in _write_single_item
raise ValueError(
ValueError: Data shape (32, 1, 256, 256) does not match expected shape (1, 256, 256).</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits December 19, 2025 05:16
…items

The __setitem__ method was passing entire batch arrays when iterating over
batch indices. Now properly extracts each item from the batch based on batch_idx.
Also filters out special 'idx' metadata key that shouldn't be written to disk.

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
- Changed array[:, c, ...] to array[c, ...] in single-item write to correctly
  extract class channel from (classes, ...spatial...) format
- Updated batch tests to use proper data formats
- All 9 batch operation tests now pass
- All 16 existing dataset_writer tests still pass

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
- Extract metadata keys filtering to constant _METADATA_KEYS to avoid duplication
- Add comprehensive docstring to writer_setup fixture
- Clarify import comment in test_batch_write_2d_data

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix dataset writer issue during prediction Fix batch write operations in CellMapDatasetWriter Dec 19, 2025
Copilot AI requested a review from rhoadesScholar December 19, 2025 05:27
@rhoadesScholar rhoadesScholar requested review from Copilot and removed request for rhoadesScholar December 19, 2025 05:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in CellMapDatasetWriter that caused batch write operations to fail with shape mismatch errors. The issue occurred because the iteration logic incorrectly passed entire batch arrays to each index instead of extracting individual items, and the channel indexing was incorrect for single-item arrays.

Key changes:

  • Fixed batch iteration to extract per-item data using batch_idx when processing batch indices
  • Corrected channel indexing from array[:, c, ...] to array[c, ...] for single-item writes (after batch extraction)
  • Added metadata filtering to skip special keys like "idx" that shouldn't be written to disk

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/cellmap_data/dataset_writer.py Fixed batch write logic to correctly extract individual items from batches, corrected channel indexing for single items, and added metadata key filtering
tests/test_dataset_writer_batch.py Added comprehensive test suite covering batch operations with tensor/numpy/list indices, large batches, dictionary arrays, 2D data, scalar values, and mixed data types

@rhoadesScholar rhoadesScholar marked this pull request as ready for review December 22, 2025 15:27
@codecov
Copy link

codecov bot commented Dec 22, 2025

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.11%. Comparing base (46531e7) to head (c85ec54).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/cellmap_data/dataset_writer.py 78.57% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #55      +/-   ##
==========================================
+ Coverage   55.98%   61.11%   +5.12%     
==========================================
  Files          27       27              
  Lines        2490     2502      +12     
==========================================
+ Hits         1394     1529     +135     
+ Misses       1096      973     -123     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rhoadesScholar rhoadesScholar merged commit 02373d6 into main Dec 22, 2025
12 checks passed
@rhoadesScholar rhoadesScholar deleted the copilot/fix-dataset-writer-issue branch December 22, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset writer issue

2 participants