Skip to content

fix(picklescanner): handle GET/BINGET to missing memo without aborting scan#348

Open
palios-taey wants to merge 1 commit into
protectai:mainfrom
palios-taey:fix-edp-invalid-memo-binget
Open

fix(picklescanner): handle GET/BINGET to missing memo without aborting scan#348
palios-taey wants to merge 1 commit into
protectai:mainfrom
palios-taey:fix-edp-invalid-memo-binget

Conversation

@palios-taey
Copy link
Copy Markdown

Summary

When PickleUnsafeOpScan walks a pickle stream and encounters a STACK_GLOBAL whose operands resolve via GET / BINGET / LONG_BINGET into a memo index that was never PUT, the unhandled KeyError propagates and the scanner returns no issues for the whole file — even when unsafe globals were already seen earlier in the byte stream.

This PR adds the missing handling: when a memo reference is missing, treat the resolved value as "unknown" (the same convention already used at the next branch of the same conditional for unrecognized opcodes) and continue scanning. A regression test reproduces the case.

Refs #338 (Invalid Memo Reference category).

Why

This pattern is one instance of the EDP (Exception-Directed Programming) class described in arXiv:2508.19774 and credited via GHSA-9gvj-pp9x-gcfr: a malformed-but-parseable pickle whose sole purpose is to crash the scanner before the malicious payload is reported. Without this fix, a CI gate using modelscan lets affected files through silently — the JSON output shows "total_issues": 0, and the only signal is a one-character errors[].description field (the missing memo key as a bare string) which most consumers ignore.

The fix is the minimal, locally-correct change: a single conditional inside the existing STACK_GLOBAL operand-resolution loop. It does not alter behavior for well-formed pickles. It does not bias toward any particular architectural direction (denylist expansion vs allowlist) — the test asserts only that unsafe globals appearing before the malformed reference are still reported.

Testing

Added tests/test_modelscan.py::test_scan_picklescanner_resilient_to_invalid_memo_binget. The test generator (edp_invalid_memo_binget_gen) constructs a minimal pickle:

  1. GLOBAL + REDUCE invoking os.system("echo pwned") (benign demo command)
  2. Two BINGET opcodes referencing memo indices 7 and 8 (never PUT)
  3. STACK_GLOBAL (would consume those references)
  4. STOP

Verified locally:

  • Before fix: len(ms.issues.all_issues) == 0 — the os.system global is silently dropped.
  • After fix: len(ms.issues.all_issues) == 2 — the os.system CRITICAL is reported, plus an unknown/unknown CRITICAL from the malformed STACK_GLOBAL itself.

The fixture is self-contained, runs offline (no network fetch), and contains no exploitation primitives beyond what is already public via the Hide-and-Seek paper, GHSA-9gvj-pp9x-gcfr, and the corresponding artifact repo.

…g scan

PickleUnsafeOpScan currently raises KeyError when STACK_GLOBAL analysis
encounters a GET/BINGET/LONG_BINGET opcode referencing a memo index that
was never PUT. The unhandled exception propagates up and the scanner
returns zero issues for the whole file, even when unsafe globals appear
earlier in the byte stream.

This is one instance of the EDP (Exception-Directed Programming) class
discussed in arXiv:2508.19774 ("Hide and Seek"; GHSA-9gvj-pp9x-gcfr) and
the "Invalid Memo Reference" category in protectai#338: a malformed-but-parseable
pickle whose only function is to crash the scanner before the malicious
payload is reported.

Fix: when GET/BINGET references a missing memo index, treat the resolved
value as "unknown" (matching the existing convention at the next branch
of the same conditional for unrecognized opcodes) and continue scanning
the remainder of the stream.

Test: tests/test_modelscan.py::test_scan_picklescanner_resilient_to_invalid_memo_binget
generates a minimal pickle that calls os.system via GLOBAL+REDUCE and
follows it with STACK_GLOBAL referencing memo indices 7 and 8 (never
PUT). Before this fix, modelscan returns 0 issues. After, it reports
the os.system CRITICAL.

Refs protectai#338.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant