Skip to content

Conversation

@dhalf
Copy link
Contributor

@dhalf dhalf commented Nov 25, 2025

Implement EXT1, EXT2, and EXT4 Pickle Opcode Support

Problem

Fickling raises NotImplementedError: TODO: Add support for Opcode EXT1 (and EXT2, EXT4) when analyzing pickle files that use extension registry opcodes from pickle protocol 2+.

What Are EXT Opcodes?

EXT opcodes allow pickles to reference pre-registered objects from the global extension registry using integer codes instead of full module/name paths:

  • EXT1 (0x82): 1-byte unsigned integer (0-255)
  • EXT2 (0x83): 2-byte unsigned integer (0-65,535)
  • EXT4 (0x84): 4-byte signed integer (0-2,147,483,647)

These opcodes look up objects in copyreg._extension_registry which maps integer codes to (module, name) tuples.

Solution

Implemented three new opcode classes:

  1. Ext1: Base implementation that generates AST code showing the registry lookup
  2. Ext2: Inherits from Ext1 (same logic, 2-byte arg)
  3. Ext4: Inherits from Ext1 (same logic, 4-byte arg)

The implementation uses the "Middle Ground" approach:

  • Generates code showing copyreg._extension_registry.get(code, (None, None))
  • Uses .get() with a safe default so it won't crash if registry isn't populated
  • Provides informative output for security analysis

Example Output

Before: NotImplementedError: TODO: Add support for Opcode EXT1

After:
import copyreg
_var0 = copyreg._extension_registry.get(42, (None, None))
_var1 = _var0()
_var1.setstate({'value': 42})
result0 = _var1

Testing

  • ✅ Created test pickle with EXT1 opcode (using copyreg.add_extension)
  • ✅ Verified fickling successfully analyzes it without errors
  • ✅ All existing tests pass (20/20)
  • ✅ No regressions

Benefits

  • Fixes the NotImplementedError: Pickles with EXT opcodes now analyze successfully
  • Security analysis: Shows what extension codes are being used
  • Safe implementation: Won't crash if the extension isn't registered
  • Educational: Generated code shows the registry lookup mechanism

Add support for extension registry opcodes (EXT1, EXT2, EXT4) which are
part of pickle protocol 2+. These opcodes allow pickles to reference
pre-registered objects from copyreg._extension_registry using integer codes
instead of full module/name paths.

Implementation:
- Added Ext1 opcode class that generates AST code showing registry lookup
- Added Ext2 and Ext4 as subclasses (inherit same logic, different arg sizes)
- Uses copyreg._extension_registry.get(code, (None, None)) for safe lookup
- Generates informative code for security analysis

The fix uses the Middle Ground approach:
- Shows what extension code is being used (valuable for auditing)
- Won't crash if registry isn't populated (uses .get() with default)
- Generates readable AST output

Example generated code:
```python
import copyreg
_var0 = copyreg._extension_registry.get(42, (None, None))
```

Fixes NotImplementedError when analyzing pickles with EXT opcodes.
All existing tests pass (20/20).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dhalf dhalf requested a review from ESultanik as a code owner November 25, 2025 16:39
@dhalf
Copy link
Contributor Author

dhalf commented Nov 25, 2025

#114

@dguido
Copy link
Member

dguido commented Jan 9, 2026

The implementation uses the wrong registry. copyreg has two registries:

Registry Maps Used For
_extension_registry (module, name) → code Pickling
_inverted_registry code → (module, name) Unpickling

EXT opcodes are used during unpickling - they take an integer code and need to resolve it to a (module, name) tuple. The current implementation uses _extension_registry which maps in the opposite direction.

# Current (incorrect):
copyreg._extension_registry.get(code, (None, None))
# This looks up code as if it were a (module, name) tuple

# Should be:
copyreg._inverted_registry.get(code, (None, None))
# This correctly maps code -> (module, name)

You can verify this in a Python REPL:

>>> import copyreg
>>> copyreg.add_extension('mymodule', 'MyClass', 42)
>>> copyreg._extension_registry
{('mymodule', 'MyClass'): 42}
>>> copyreg._inverted_registry
{42: ('mymodule', 'MyClass')}

The EXT opcode receives 42 and needs to get ('mymodule', 'MyClass'), so it should use _inverted_registry.

@dguido
Copy link
Member

dguido commented Jan 9, 2026

Minor issue: The implementation adds import copyreg to the module body every time an EXT opcode runs:

def run(self, interpreter: Interpreter):
    # ...
    interpreter.module_body.append(
        ast.Import(names=[ast.alias('copyreg', None)])
    )

If a pickle contains multiple EXT opcodes, the generated AST will have duplicate imports:

import copyreg
import copyreg  # duplicate
import copyreg  # duplicate
_var0 = copyreg._inverted_registry.get(1, (None, None))
_var1 = copyreg._inverted_registry.get(2, (None, None))
_var2 = copyreg._inverted_registry.get(3, (None, None))

This isn't a correctness bug (Python tolerates duplicate imports), but it's worth noting. Other opcodes like Global have the same pattern, so this is consistent with the existing codebase - just something to be aware of for potential future cleanup.

dguido and others added 2 commits January 20, 2026 11:28
- Use _inverted_registry instead of _extension_registry to look up
  extension codes
- Generate code that resolves the (module, name) tuple to the actual
  object by importing the module and using getattr on sys.modules

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dguido dguido force-pushed the implement-ext-opcodes branch from 551ca81 to 63c7bd4 Compare January 20, 2026 16:44
Test coverage for:
- EXT1: 1-byte extension code with class (OrderedDict)
- EXT2: 2-byte extension code with class (Counter)
- EXT4: 4-byte extension code with class (deque)
- EXT1 with function: submodule import (os.path.join)

Each test verifies the generated AST code produces the same result
as pickle.loads().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dguido dguido merged commit c2087d9 into master Jan 20, 2026
13 checks passed
@dguido dguido deleted the implement-ext-opcodes branch January 20, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants