feat(torch_compat): Add torch_compat module
#194
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Compatibility module for
torch.saveandtorch.loadThis change adds a new compatibility module to tensorizer,
tensorizer.torch_compat. It provides an interface, implemented with context managers, for usingtorch.saveandtorch.loadwithtensorizeras their backend for the serialization of tensors and tensor storages, while leaving serialization and deserialization of all other objects and metadata to the respectivetorchfunctions. A brief description and usage pointers for this module are in the changelog, copied below.Changelog
tensorizer.torch_compatis a new module for usingtensorizeras a backend for handling tensor data during standardtorch.saveandtorch.loadcallstensorizeras a backend fortorch.save, wrap the call in thetensorizer_savingcontext managertensorizer_loadingtensorizeras a backend fortorch.load, wrap the call in thetensorizer_loadingcontext managertensorizer_savingHighlights
This new module provides several advantages over using
tensorizerdirectly:torch.saveandtorch.loadtorch.saveNot supported
tensorizer.torch_compat.tensorizer_loading()does not support device selection following the same rules as themap_locationargument totorch.load().Usage
This module is intended to be very simple to use, to make it easy to adapt existing code using
torchserialization functions to make use oftensorizer. An instance oftorch.nn.Modulecan be serialized as follows:There are only two public functions in
torch_compat:tensorizer_saving(), andtensorizer_loading(). The API of these functions is thoroughly documented in their docstrings. Several supported use cases are demonstrated in the test suite. It is copied below, for reference.tensorizer_saving()tensorizer_loading()Async checkpointing example
This module is compatible with async saving. The boilerplate for implementing simple async checkpointing in a
transformers.Trainertraining loop is shown below. This utilizes thesave_funcparameter totensorizer_savingto involve a thread pool in the call to the serializer, managed by the calling code.Object storage support
These context managers support object storage paths. While providing an exact path is easiest, this can sometimes be eased by use of the filename callback argument to
tensorizer_saving()andtensorizer_loading(). Below is an example that uploads only tensor weights to object storage, while saving metadata at a local path. It operates entirely based on a callback that converts a filename passed totorch.saveinto a correspondings3://URI. A similar hook could be added to asave_functo also convert local file paths for metadata tos3://URIs. (Note: none of this complexity is necessary if the calling code can choose file paths directly—using callbacks in this way is mainly for messing with third-party library code usingtorch.saveandtorch.loadin inaccessible locations).Version update
This PR also updates the code version to 2.11.0a0. The full update to 2.11.0 for the release of this module will come in a subsequent PR.
Future work
This module is subject to the same issue as
torch.savewith overly large storages backing tensors: full storages are always saved for all tensors present, even if only small views of them actually appear in the tensors being saved. This is part oftorch's implementation both incidentally and to support tied weights between separate tensors. To also support tied weights, we inherit this behaviour, but we have a way planned out to make this less of an issue. That work isn't finished yet, so it is not included in this PR.