Skip to content

security: add class-loading allowlist to PyTorch unpicklers (CWE-502)#3014

Open
SnailSploit wants to merge 1 commit intogoogle:mainfrom
SnailSploit:fix/pytorch-layout-unpickler-rce
Open

security: add class-loading allowlist to PyTorch unpicklers (CWE-502)#3014
SnailSploit wants to merge 1 commit intogoogle:mainfrom
SnailSploit:fix/pytorch-layout-unpickler-rce

Conversation

@SnailSploit
Copy link

Summary

Add class-loading allowlist to MetadataUnpickler and CustomTorchUnpickler in PyTorchLayout to prevent arbitrary code execution via malicious .pt/.pth checkpoint files.

Security Issue

Both MetadataUnpickler and CustomTorchUnpickler extend pickle.Unpickler. While MetadataUnpickler.find_class() intercepts specific torch/numpy reconstruction functions, unrecognized classes fall through to super().find_class(), which resolves and returns any class from any importable module. CustomTorchUnpickler does not override find_class() at all.

This allows a crafted .pt checkpoint file to embed pickle opcodes that instantiate dangerous classes (e.g., os.system, subprocess.Popen, builtins.eval), achieving arbitrary code execution when the checkpoint is loaded.

Attack scenario: A user loads a PyTorch checkpoint from an untrusted source. The data.pkl inside the .pt zip archive contains pickle opcodes referencing os.system or subprocess.Popen. The unpickler resolves and calls these classes, executing attacker-controlled commands.

Changes

  1. Added _SAFE_UNPICKLE_CLASSES allowlist: An explicit set of (module, name) pairs covering torch storage types, tensor reconstruction functions, numpy array reconstruction, and standard container types (OrderedDict, _codecs.encode).
  2. MetadataUnpickler.find_class(): After checking intercepted classes, falls back to the allowlist instead of unrestricted super().find_class(). Raises pickle.UnpicklingError for any class not in the allowlist.
  3. CustomTorchUnpickler.find_class() (new override): Restricts class loading to the same allowlist.

Backward Compatibility

Legitimate PyTorch checkpoints contain only tensor reconstruction functions, storage types, and standard containers — all included in the allowlist. Checkpoints embedding arbitrary Python classes will now raise a clear UnpicklingError with guidance to file an issue if a legitimate class is missing from the allowlist.

MetadataUnpickler.find_class() falls through to super().find_class()
for unrecognized classes, allowing arbitrary class instantiation from
malicious .pt checkpoint files. CustomTorchUnpickler has no find_class
override at all, making it equally vulnerable.

Changes:
- Add _SAFE_UNPICKLE_CLASSES allowlist covering torch storage types,
  tensor reconstruction functions, numpy array reconstruction, and
  standard container types (OrderedDict, _codecs.encode)
- Add find_class() override to CustomTorchUnpickler with allowlist
- Replace MetadataUnpickler's super().find_class() fallthrough with
  allowlist check and clear UnpicklingError for blocked classes

Backward compatible: legitimate PyTorch checkpoints only use tensor
reconstruction, storage types, and standard containers.
@google-cla
Copy link

google-cla bot commented Mar 23, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant