Add type hints to block and IO APIs#2048
Conversation
zacharyburnett
left a comment
There was a problem hiding this comment.
looks good to me, I like all the new signatures. This should make using ASDF with a lsp much easier
| msg = f"Unable to open {init} with mode {mode}" | ||
| raise ValueError(msg) from err | ||
| return get_file(fd, uri=uri or init, close=True) | ||
| return get_file(fd, uri=str(uri or init), close=True) |
There was a problem hiding this comment.
Would you comment on the need for str? It looks reasonable but I'm curious how it came about.
There was a problem hiding this comment.
At this point init can be either a str or Path, but we expect uri to be a str, so this does the conversion if needed.
There was a problem hiding this comment.
Thanks. And this came about because this function was type annotated? Does that mean we don't have a test for this bug?
There was a problem hiding this comment.
I don't think there is a bug now? str(uri or init) isn't doing a type-system cast: if init is a Path it will correctly convert it to a str at runtime. str(Path("/foo")) == "/foo"
But yeah this is because of type annotations. Initially I was typing uri as uri: str | Path, but then there were places inside asdf that are doing string manipulation on uri, so I had to convert it to just be a str.
| def test_checksum(tmp_path): | ||
| my_array = np.arange(0, 64, dtype="<i8") | ||
| target_checksum = b"\xcaM\\\xb8t_L|\x00\n+\x01\xf1\xcfP1" | ||
| # pyrefly: ignore [bad-argument-type] |
There was a problem hiding this comment.
Should we rerun the baseline instead of adding these ignores? Would that work here?
There was a problem hiding this comment.
I'd prefer not to add more things to the baseline file if possible. Ideally we'd eventually fix all of the items in the file and remove it.
Plus, if we ever want to switch to a different type-checker without baseline file support it would be a lot easier to just grep for and replace pyrefly: ignore instances.
| if last_block is None: | ||
| new_block_start = new_tree_size | ||
| elif last_block.data_offset is None: | ||
| msg = "Unable to update ASDF file because source file is not seekable" |
There was a problem hiding this comment.
We're already checking if the file is seekable:
Line 902 in a883bff
If we add another check inside this update (at the start) would that be enough to appease the type checker?
There was a problem hiding this comment.
So unfortunately the type checker is not able to infer that "file is not seekable" means "data_offset is not None". Technically even if a file is seekable, somewhere else in the code could set data_offset = None without the type checker complaining, which means it can't be assumed to be non-None here.
In general this type of pattern is not handled well by the type system.
|
|
||
| def update_offset(offset: int | None) -> int: | ||
| if offset is None: | ||
| msg = "Unable to update ASDF file because source file is not seekable" |
There was a problem hiding this comment.
Description
asdf._blockand its submodules now have 100% type coverage.GenericFileand its subclasses are mostly fully typed now.generic_ioclasses in the rest of the codebase, which will aid in future refactors.asdf.typing:ByteArray1Das an alias for a 1-D uint8 numpy array which is used all over the place in block IOBlockDataCallbackas an alias for a callable that returnsByteArray1D, which is used for lazy block loadingBlockHeadertyped dictionary to describe the block header fieldsTaggedDict[str, int]orTaggedList[float]Taggedis now also generic over its inner type, which means type checkers will now recognizeTaggedList[str]as a subtype ofTagged[list[str]].asdf._block.external.UseInternalto be a sentinel value that type checkers will understand: the actual value has been renamed toUSE_INTERNALandUseInternalis now a type alias for the value's type.As with the previous typing PR there should be no functional changes and no changes to the public API (other than adding type hints and the change to
UseInternal, which I'm not sure is public).I made a few refactoring changes when necessary. Most of them are downstream of the fact that
ReadBlockandWriteBlockcan have an offset ofNoneif the source file isn't seekable (or if the file being read was not seekable when written). There are a lot of places that assume that offset is an integer and would break if offset were ever actuallyNone, so I had to add some checks in those places to handle theNonecase.In the future I think that we should just drop support for non-seekable files, which would allow offset to always be an integer.