-
Notifications
You must be signed in to change notification settings - Fork 66
Implement EXT1, EXT2, and EXT4 pickle opcode support #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add support for extension registry opcodes (EXT1, EXT2, EXT4) which are part of pickle protocol 2+. These opcodes allow pickles to reference pre-registered objects from copyreg._extension_registry using integer codes instead of full module/name paths. Implementation: - Added Ext1 opcode class that generates AST code showing registry lookup - Added Ext2 and Ext4 as subclasses (inherit same logic, different arg sizes) - Uses copyreg._extension_registry.get(code, (None, None)) for safe lookup - Generates informative code for security analysis The fix uses the Middle Ground approach: - Shows what extension code is being used (valuable for auditing) - Won't crash if registry isn't populated (uses .get() with default) - Generates readable AST output Example generated code: ```python import copyreg _var0 = copyreg._extension_registry.get(42, (None, None)) ``` Fixes NotImplementedError when analyzing pickles with EXT opcodes. All existing tests pass (20/20). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
The implementation uses the wrong registry.
EXT opcodes are used during unpickling - they take an integer code and need to resolve it to a # Current (incorrect):
copyreg._extension_registry.get(code, (None, None))
# This looks up code as if it were a (module, name) tuple
# Should be:
copyreg._inverted_registry.get(code, (None, None))
# This correctly maps code -> (module, name)You can verify this in a Python REPL: >>> import copyreg
>>> copyreg.add_extension('mymodule', 'MyClass', 42)
>>> copyreg._extension_registry
{('mymodule', 'MyClass'): 42}
>>> copyreg._inverted_registry
{42: ('mymodule', 'MyClass')}The EXT opcode receives |
|
Minor issue: The implementation adds def run(self, interpreter: Interpreter):
# ...
interpreter.module_body.append(
ast.Import(names=[ast.alias('copyreg', None)])
)If a pickle contains multiple EXT opcodes, the generated AST will have duplicate imports: import copyreg
import copyreg # duplicate
import copyreg # duplicate
_var0 = copyreg._inverted_registry.get(1, (None, None))
_var1 = copyreg._inverted_registry.get(2, (None, None))
_var2 = copyreg._inverted_registry.get(3, (None, None))This isn't a correctness bug (Python tolerates duplicate imports), but it's worth noting. Other opcodes like |
- Use _inverted_registry instead of _extension_registry to look up extension codes - Generate code that resolves the (module, name) tuple to the actual object by importing the module and using getattr on sys.modules Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
551ca81 to
63c7bd4
Compare
Test coverage for: - EXT1: 1-byte extension code with class (OrderedDict) - EXT2: 2-byte extension code with class (Counter) - EXT4: 4-byte extension code with class (deque) - EXT1 with function: submodule import (os.path.join) Each test verifies the generated AST code produces the same result as pickle.loads(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement EXT1, EXT2, and EXT4 Pickle Opcode Support
Problem
Fickling raises NotImplementedError: TODO: Add support for Opcode EXT1 (and EXT2, EXT4) when analyzing pickle files that use extension registry opcodes from pickle protocol 2+.
What Are EXT Opcodes?
EXT opcodes allow pickles to reference pre-registered objects from the global extension registry using integer codes instead of full module/name paths:
These opcodes look up objects in copyreg._extension_registry which maps integer codes to (module, name) tuples.
Solution
Implemented three new opcode classes:
The implementation uses the "Middle Ground" approach:
Example Output
Before: NotImplementedError: TODO: Add support for Opcode EXT1
After:
import copyreg
_var0 = copyreg._extension_registry.get(42, (None, None))
_var1 = _var0()
_var1.setstate({'value': 42})
result0 = _var1
Testing
Benefits