Skip to content

Adding support for .jpk-qi-data and .bin files#190

Open
ahobbs7 wants to merge 45 commits into
mainfrom
ahobbs7/jpk-qi-data
Open

Adding support for .jpk-qi-data and .bin files#190
ahobbs7 wants to merge 45 commits into
mainfrom
ahobbs7/jpk-qi-data

Conversation

@ahobbs7
Copy link
Copy Markdown
Collaborator

@ahobbs7 ahobbs7 commented May 3, 2026

Adds support for .jpk-qi-data files, allowing the loading of the image for the selected channel. It also allows for the raw force-distance curve data as well as metadata for each curve to be loaded efficiently, using a lazy data structure approach that loads each section of the dataset into memory only when it is required.
The updates also allow for the conversion of .jpk-qi-data files into a HDF5 file format (.h5-jpk) for much faster mass analysis. Hence, the h5-jpk loading code has also been updated to support the loading of these converted files.

The update also allows .bin with different binary formats to be loaded.

closes #174 closes #173

…files and removing the ability to run curve analysis directly in the reader
…g to memory then saving in one go when duplicating data to h5
…sampled curves then are streamed directly into h5
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 4, 2026

Codecov Report

❌ Patch coverage is 22.79910% with 684 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.60%. Comparing base (d65d297) to head (0146542).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
AFMReader/jpk_qi.py 10.07% 491 Missing ⚠️
AFMReader/h5_jpk.py 22.65% 99 Missing ⚠️
AFMReader/lazy_data_classes.py 38.77% 30 Missing ⚠️
AFMReader/general_loader.py 50.84% 29 Missing ⚠️
AFMReader/raw_bin.py 21.21% 26 Missing ⚠️
AFMReader/asd.py 72.72% 3 Missing ⚠️
AFMReader/jpk.py 89.28% 3 Missing ⚠️
AFMReader/spm.py 76.92% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #190       +/-   ##
===========================================
- Coverage   79.84%   51.60%   -28.25%     
===========================================
  Files          12       15        +3     
  Lines         928     1804      +876     
===========================================
+ Hits          741      931      +190     
- Misses        187      873      +686     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ahobbs7 ahobbs7 marked this pull request as ready for review May 5, 2026 07:54
@ahobbs7 ahobbs7 requested a review from SylviaWhittle May 5, 2026 07:54
Copy link
Copy Markdown
Collaborator

@SylviaWhittle SylviaWhittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small partial review.

I'd like to call with you to talk about the interesting lazy loading of data if you have time?

Comment thread AFMReader/asd.py
return frames, pixel_to_nanometre_scaling_factor, header_dict


def get_asd_channels(file_path: Path):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to check this manually - does it work? There isn't a test but frankly we don't have the dev time to move slowly. If you say it works, this is fine with me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel fetching seems to work though I'm happy to quickly make some tests for them as that should be pretty quick.

Comment thread AFMReader/general_loader.py Outdated
elif len(h5_returned) == 4:
image, pixel_to_nanometre_scaling_factor, _, curve_data = h5_returned # type: ignore[misc]
self.loaded_curves = True
print(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(
logger.info(

or just remove it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll just remove them, they are some accidentally left in prints for debugging.

Comment thread AFMReader/general_loader.py Outdated
f"Loaded image with shape {image.shape} and pixel to nanometre "
f"scaling factor {pixel_to_nanometre_scaling_factor}"
)
print(f"Image has max value {image.max()} and min value {image.min()}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto :)

image, pixel_to_nanometre_scaling_factor = spm.load_spm(self.filepath, self.channel)
elif self.suffix == ".h5-jpk":
image, pixel_to_nanometre_scaling_factor, _ = h5_jpk.load_h5jpk(self.filepath, self.channel)
h5_returned = h5_jpk.load_h5jpk(self.filepath, self.channel, load_curves=not self.loaded_curves)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this entire block be absorbed into the h5_jpk.load_h5jpk function call? It's a little messy here.

I'll have a go too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, is definitely a bit messy. My thinking was preventing the need to make changes to topostats by not changing what each load function returns if it isn't reading the curve data I'm adding support for. Though I think h5jpk is not used in TopoStats anyway? And it might be necessary to make changes to topostats reading anyway due to the changes I have been working on for returning the z unit for the read files?

raise e

# scope for a "check what channels are available" function similar to above.
def get_available_channels(self): # noqa: C901
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing this is for the napari feature to list the channels?

Well done if this is all working, that's a lot of work.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Comment thread AFMReader/lazy_data_classes.py Outdated
A proxy object for the specified row.
"""

class RowProxy:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason RowProxy is defined within the scope of LazyQiData encapsulation? It'll get redefined each time __getitem__ runs, though this is likely a negligible and unimportant cost.

Comment thread AFMReader/lazy_data_classes.py Outdated
self.parent = parent
self.y = y

def __getitem__(self, x: int):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to need a bit of an explanation on how this is intended to work - could we set up a call to talk about it? There's a lot going on here. Plus, frankly I'm just curious.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely!

Comment thread AFMReader/logging.py
# Set the format to have blue time, green file, module, function and line, and white message
logger.add(
sys.stderr,
lambda msg: sys.stderr.write(msg), # pylint: disable=unnecessary-lambda
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What necessitated this change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the general_loader and some of the spm tests which required were failing due to not being able to correctly reading the logged output (instead always reading an empty string), they seem to be failing when I run even the main branch locally for that reason. The lamda appears to be necessary due to a quirk of loguru where lamda makes sys.stderr be dynamically retrieved at the time of logging rather than once at import time.

Comment thread AFMReader/jpk_qi.py
return final_multiplier, final_offset, unit


class jpk_qi_loader:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalise this please :)

Comment thread AFMReader/jpk_qi.py
# Open the ZIP archive once and keep it open for the duration of the loading process
self.qi_archive = zipfile.ZipFile(self.filepath, "r") # pylint: disable=consider-using-with
logger.info(f"Opened JPK QI archive at {self.filepath}")
self.namelist = self.qi_archive.namelist()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this list_of_all_paths or something - since namelist could be anything? Just so future people (including ourselves) can tell at a glance what this is.

Maybe also add a comment above as to what it's used for? :)

Comment thread AFMReader/jpk_qi.py
self.qi_archive = zipfile.ZipFile(self.filepath, "r") # pylint: disable=consider-using-with
logger.info(f"Opened JPK QI archive at {self.filepath}")
self.namelist = self.qi_archive.namelist()
# Set path to the .jpk-qi-image file within the archive for later use
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Set path to the .jpk-qi-image file within the archive for later use
# For holding the reference to where the actual .jqk-qi image is (not the metadata).

Comment thread AFMReader/jpk_qi.py
if self.save_as_h5:
self.save_lite_data()

return (self.image, self.px2nm, (self.curve_data, self.channels_units, self.full_metadata))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly bundle these objects into one metadata dataclass, JPKQiCurveData? Gentle suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] : Read .jpk-qi-data files [Bug]: .jpk-qi-data files not loading

3 participants