-
Notifications
You must be signed in to change notification settings - Fork 2
Fix batch write operations in CellMapDatasetWriter #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…items The __setitem__ method was passing entire batch arrays when iterating over batch indices. Now properly extracts each item from the batch based on batch_idx. Also filters out special 'idx' metadata key that shouldn't be written to disk. Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
- Changed array[:, c, ...] to array[c, ...] in single-item write to correctly extract class channel from (classes, ...spatial...) format - Updated batch tests to use proper data formats - All 9 batch operation tests now pass - All 16 existing dataset_writer tests still pass Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
- Extract metadata keys filtering to constant _METADATA_KEYS to avoid duplication - Add comprehensive docstring to writer_setup fixture - Clarify import comment in test_batch_write_2d_data Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug in CellMapDatasetWriter that caused batch write operations to fail with shape mismatch errors. The issue occurred because the iteration logic incorrectly passed entire batch arrays to each index instead of extracting individual items, and the channel indexing was incorrect for single-item arrays.
Key changes:
- Fixed batch iteration to extract per-item data using
batch_idxwhen processing batch indices - Corrected channel indexing from
array[:, c, ...]toarray[c, ...]for single-item writes (after batch extraction) - Added metadata filtering to skip special keys like "idx" that shouldn't be written to disk
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/cellmap_data/dataset_writer.py |
Fixed batch write logic to correctly extract individual items from batches, corrected channel indexing for single items, and added metadata key filtering |
tests/test_dataset_writer_batch.py |
Added comprehensive test suite covering batch operations with tensor/numpy/list indices, large batches, dictionary arrays, 2D data, scalar values, and mixed data types |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…cross multiple files
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #55 +/- ##
==========================================
+ Coverage 55.98% 61.11% +5.12%
==========================================
Files 27 27
Lines 2490 2502 +12
==========================================
+ Hits 1394 1529 +135
+ Misses 1096 973 -123 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Batch writes to
CellMapDatasetWriterfailed with shape mismatches because the iteration logic passed entire batch arrays to each index instead of extracting individual items.Changes
batch_idxwhen iterating over batch indicesarray[:, c, ...]→array[c, ...]to correctly extract class channels from(classes, ...spatial)formatExample
Testing
Added 9 batch operation tests covering tensor/numpy/list indices, large batches, dict arrays, and 2D/3D data. All 260 existing tests pass.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.