security: add class-loading allowlist to PyTorch unpicklers (CWE-502)#3014
Open
SnailSploit wants to merge 1 commit intogoogle:mainfrom
Open
security: add class-loading allowlist to PyTorch unpicklers (CWE-502)#3014SnailSploit wants to merge 1 commit intogoogle:mainfrom
SnailSploit wants to merge 1 commit intogoogle:mainfrom
Conversation
MetadataUnpickler.find_class() falls through to super().find_class() for unrecognized classes, allowing arbitrary class instantiation from malicious .pt checkpoint files. CustomTorchUnpickler has no find_class override at all, making it equally vulnerable. Changes: - Add _SAFE_UNPICKLE_CLASSES allowlist covering torch storage types, tensor reconstruction functions, numpy array reconstruction, and standard container types (OrderedDict, _codecs.encode) - Add find_class() override to CustomTorchUnpickler with allowlist - Replace MetadataUnpickler's super().find_class() fallthrough with allowlist check and clear UnpicklingError for blocked classes Backward compatible: legitimate PyTorch checkpoints only use tensor reconstruction, storage types, and standard containers.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add class-loading allowlist to
MetadataUnpicklerandCustomTorchUnpicklerinPyTorchLayoutto prevent arbitrary code execution via malicious.pt/.pthcheckpoint files.Security Issue
Both
MetadataUnpicklerandCustomTorchUnpicklerextendpickle.Unpickler. WhileMetadataUnpickler.find_class()intercepts specific torch/numpy reconstruction functions, unrecognized classes fall through tosuper().find_class(), which resolves and returns any class from any importable module.CustomTorchUnpicklerdoes not overridefind_class()at all.This allows a crafted
.ptcheckpoint file to embed pickle opcodes that instantiate dangerous classes (e.g.,os.system,subprocess.Popen,builtins.eval), achieving arbitrary code execution when the checkpoint is loaded.Attack scenario: A user loads a PyTorch checkpoint from an untrusted source. The
data.pklinside the.ptzip archive contains pickle opcodes referencingos.systemorsubprocess.Popen. The unpickler resolves and calls these classes, executing attacker-controlled commands.Changes
_SAFE_UNPICKLE_CLASSESallowlist: An explicit set of(module, name)pairs covering torch storage types, tensor reconstruction functions, numpy array reconstruction, and standard container types (OrderedDict,_codecs.encode).MetadataUnpickler.find_class(): After checking intercepted classes, falls back to the allowlist instead of unrestrictedsuper().find_class(). Raisespickle.UnpicklingErrorfor any class not in the allowlist.CustomTorchUnpickler.find_class()(new override): Restricts class loading to the same allowlist.Backward Compatibility
Legitimate PyTorch checkpoints contain only tensor reconstruction functions, storage types, and standard containers — all included in the allowlist. Checkpoints embedding arbitrary Python classes will now raise a clear
UnpicklingErrorwith guidance to file an issue if a legitimate class is missing from the allowlist.