Skip to content

Add type hints to block and IO APIs#2048

Open
sydduckworth wants to merge 12 commits into
asdf-format:mainfrom
sydduckworth:block-type-hints
Open

Add type hints to block and IO APIs#2048
sydduckworth wants to merge 12 commits into
asdf-format:mainfrom
sydduckworth:block-type-hints

Conversation

@sydduckworth
Copy link
Copy Markdown
Member

@sydduckworth sydduckworth commented May 12, 2026

Description

  • asdf._block and its submodules now have 100% type coverage.
  • GenericFile and its subclasses are mostly fully typed now.
  • The advantage of adding this type coverage is now its possible to trace the precise usage of generic_io classes in the rest of the codebase, which will aid in future refactors.
  • Added new type aliases to asdf.typing:
    • ByteArray1D as an alias for a 1-D uint8 numpy array which is used all over the place in block IO
    • BlockDataCallback as an alias for a callable that returns ByteArray1D, which is used for lazy block loading
  • Added a BlockHeader typed dictionary to describe the block header fields
  • The tagged container types are now generic in the same way as their parent types, e.g. you can now annotate a value with TaggedDict[str, int] or TaggedList[float]
  • Similarly, Tagged is now also generic over its inner type, which means type checkers will now recognize TaggedList[str] as a subtype of Tagged[list[str]].
  • Converted asdf._block.external.UseInternal to be a sentinel value that type checkers will understand: the actual value has been renamed to USE_INTERNAL and UseInternal is now a type alias for the value's type.

As with the previous typing PR there should be no functional changes and no changes to the public API (other than adding type hints and the change to UseInternal, which I'm not sure is public).

I made a few refactoring changes when necessary. Most of them are downstream of the fact that ReadBlock and WriteBlock can have an offset of None if the source file isn't seekable (or if the file being read was not seekable when written). There are a lot of places that assume that offset is an integer and would break if offset were ever actually None, so I had to add some checks in those places to handle the None case.
In the future I think that we should just drop support for non-seekable files, which would allow offset to always be an integer.

@sydduckworth sydduckworth marked this pull request as ready for review May 20, 2026 19:18
@sydduckworth sydduckworth requested a review from a team as a code owner May 20, 2026 19:18
@sydduckworth sydduckworth requested a review from braingram May 20, 2026 19:18
Copy link
Copy Markdown
Member

@zacharyburnett zacharyburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, I like all the new signatures. This should make using ASDF with a lsp much easier

Comment thread asdf/_block/external.py
Comment thread asdf/_block/key.py
Comment thread asdf/_compression.py
Comment thread asdf/generic_io.py
msg = f"Unable to open {init} with mode {mode}"
raise ValueError(msg) from err
return get_file(fd, uri=uri or init, close=True)
return get_file(fd, uri=str(uri or init), close=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you comment on the need for str? It looks reasonable but I'm curious how it came about.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point init can be either a str or Path, but we expect uri to be a str, so this does the conversion if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. And this came about because this function was type annotated? Does that mean we don't have a test for this bug?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a bug now? str(uri or init) isn't doing a type-system cast: if init is a Path it will correctly convert it to a str at runtime. str(Path("/foo")) == "/foo"

But yeah this is because of type annotations. Initially I was typing uri as uri: str | Path, but then there were places inside asdf that are doing string manipulation on uri, so I had to convert it to just be a str.

Comment thread asdf/_tests/_block/test_writer.py Outdated
def test_checksum(tmp_path):
my_array = np.arange(0, 64, dtype="<i8")
target_checksum = b"\xcaM\\\xb8t_L|\x00\n+\x01\xf1\xcfP1"
# pyrefly: ignore [bad-argument-type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rerun the baseline instead of adding these ignores? Would that work here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to add more things to the baseline file if possible. Ideally we'd eventually fix all of the items in the file and remove it.
Plus, if we ever want to switch to a different type-checker without baseline file support it would be a lot easier to just grep for and replace pyrefly: ignore instances.

Comment thread asdf/_block/manager.py
if last_block is None:
new_block_start = new_tree_size
elif last_block.data_offset is None:
msg = "Unable to update ASDF file because source file is not seekable"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already checking if the file is seekable:

if not fd.seekable():

If we add another check inside this update (at the start) would that be enough to appease the type checker?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So unfortunately the type checker is not able to infer that "file is not seekable" means "data_offset is not None". Technically even if a file is seekable, somewhere else in the code could set data_offset = None without the type checker complaining, which means it can't be assumed to be non-None here.

In general this type of pattern is not handled well by the type system.

Comment thread asdf/_block/manager.py

def update_offset(offset: int | None) -> int:
if offset is None:
msg = "Unable to update ASDF file because source file is not seekable"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

✅ 122 untouched benchmarks


Comparing sydduckworth:block-type-hints (34be1e1) with main (a883bff)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants