Skip to content

MACSIMA: Parsing subfolders is not intuitive when the machine writes subfolder for each cycle #378

@MSHelm

Description

@MSHelm

Hey everyone,
As reported initially by @nbonine , when preprocessing is performed during acquisition of MACSima data, for each cycle an independent folder is created.
The current reader handles this using the current preprocessed_multiple_folders parsing style. This results in separate Image, Table and coordinate system elements for each cycle. The result of this is not what a user typically expects, because all these cycles belong together and typically are analyzed together. Currently there is no straightforward way for the user to specifiy this.
For example:

my_data
- 3_Scan2
--- some_images.tif
- 6_Cycle1
--- some_more_images.tif
- 7_Cycle2
--- even_more_images.tif

is parsed into:

SpatialData object
├── Images
│     ├── '3_Scan2_image': DataTree[cyx] (4, 15275, 27678), (4, 7637, 13839), (4, 3818, 6919), (4, 1909, 3459), (4, 954, 1729)
│     ├── '6_Cycle1_image': DataTree[cyx] (4, 15275, 27678), (4, 7637, 13839), (4, 3818, 6919), (4, 1909, 3459), (4, 954, 1729)
│     └── '7_Cycle2_image': DataTree[cyx] (4, 15275, 27678), (4, 7637, 13839), (4, 3818, 6919), (4, 1909, 3459), (4, 954, 1729)
└── Tables
      ├── '3_Scan2_table': AnnData (0, 4)
      ├── '6_Cycle1_table': AnnData (0, 4)
      └── '7_Cycle2_table': AnnData (0, 4)
with coordinate systems:
    ▸ '3_Scan2', with elements:
        3_Scan2_image (Images)
    ▸ '6_Cycle1', with elements:
        6_Cycle1_image (Images)
    ▸ '7_Cycle2', with elements:
        7_Cycle2_image (Images)

I propose the following:

  • Deprecation of the auto discovery of the parsing style to use (the current default!).
  • Instead the preprocessed_single_folder becomes the new default. We change this in such a way that all tifs in the specified path, and all subdirectories are parsed together into 1 Image element. This would handle the regular case (1 folder with all tifs) and the case that happens when preprocessing is run during acquisition (several subfolders, each with tifs of 1 cycle).
  • We keep the preprocessed_multiple_folder option for the case that a user wants to do batch analysis of multiple ROIs. For example a user could have multiple ROIs of a single well, which are saved into separate folders. In these cases it is desired that the images of subfolders are separated, because they describe different image stacks.

I will submit an example implementation of this. But since this touches on the default settings of the reader, and I am not sure what @berombau intended originally with the different parsing styles I would love to have a discussion on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions