add slidescore url and slidescore study id to download manifest.#42
add slidescore url and slidescore study id to download manifest.#42YoniSchirris wants to merge 5 commits intomainfrom
Conversation
- save manifest as json - when starting the download, add slidescore_url and slidescore_study_id - add layer of slidescore_url and slidescore_study_id keys, might a user download multiple studies into a single directory, so as to not overwrite the initial values of `slidescore_study_id` if a second study is downloaded to the same directory.
- add option to save mappings without downloading WSIs - add option to save mappings as tsv instead of json - save image name instead of file name, since this is not helpful when we download .mrxs files as zips - flip image name -> image id mapping to image id -> image name mapping, since image name is not necessarily unique
|
Now also fixes #43 |
jonasteuwen
left a comment
There was a problem hiding this comment.
Some comments. Can you also fix mypy?
slidescore_api/cli.py
Outdated
| Will end up with something like | ||
| { | ||
| 'slidescore_url': str, | ||
| 'slidescore_study_id': int, | ||
| 'slide_filename_to_id_mapping': { | ||
| str: int | ||
| ... | ||
| } | ||
| } |
There was a problem hiding this comment.
This is not filename anymore is it?
There was a problem hiding this comment.
not the structure indeed.
changed
{
"url": {
"study_id": {
"slidescore_url": "url",
"slidescore_study_id": study_id,
"slide_filename_to_study_image_id_mapping": {
"image_id": "image_name",
...
...
}
}
}
}
| def append_to_tsv_mapping(save_dir: pathlib.Path, items: List[str]) -> None: | ||
| """ | ||
| Create a manifest mapping image id to the filename. | ||
| Create a manifest mapping image id to image name |
There was a problem hiding this comment.
Can you add that if the file does not exist yet, you write the first two lines?
# slidescore_url: <URL>
# study_id: <id>
There was a problem hiding this comment.
I do this manually in cli.download_wsis.
Otherwise I'd need to always pass the url and id to the function.
Also, if this would be the functionality, then downloading a second slidescory study into the same directory would NOT write the slidescore_url and study_id of the second study.
elif mapping_format == "tsv":
append_to_tsv_mapping(save_dir=save_dir, items=[f"# {slidescore_url}"])
append_to_tsv_mapping(save_dir=save_dir, items=[f"# {study_id}"])
slidescore_api/cli.py
Outdated
| # Collect image metadata | ||
| images = client.get_images(study_id) | ||
|
|
||
| # # Add study details to mapping manifest |
| disable_download: bool = False, | ||
| mapping_format: str = "json", |
There was a problem hiding this comment.
Is there no other way to get the mapping only?
There was a problem hiding this comment.
not that i can think of without duplicating a lot of code or entirely refactoring the function into less coherent parts.
| elif mapping_format == "tsv": | ||
| append_to_tsv_mapping( | ||
| save_dir=save_dir, | ||
| items=[str(image_id), image_name], | ||
| ) |
There was a problem hiding this comment.
If not one of these, it's better to raise an error
There was a problem hiding this comment.
Good point. Thought this was done by the argparser, but raising error here makes the function more generic
else:
raise ValueError(f"mapping_format should be either 'tsv' or 'json', but is {mapping_format}")
- update docstring of file structure of slidescore_mapping.json - remove double # - raise valueerror when mapping format is not properly given
slidescore_study_idif a second study is downloaded to the same directory.Fixes #40