Skip to content

Parser: implement pstring multi-byte length prefix variants (/B, /H, /L) #171

@unclesp1d3r

Description

@unclesp1d3r

Summary

Implement multi-byte length-prefix variants for the pstring type: /B (1-byte, default), /H (2-byte little-endian), and /L (4-byte little-endian).

Context

GNU libmagic supports length-prefix width suffixes on the pstring type keyword:

  • pstring or pstring/B -- 1-byte length prefix (0-255), already implemented in Parser: implement pstring (Pascal string) type #43
  • pstring/H or pstring/h -- 2-byte little-endian length prefix (0-65535)
  • pstring/L or pstring/l -- 4-byte little-endian length prefix (0-4294967295)

The current implementation only supports the default 1-byte prefix. Magic files using /H or /L suffixes will fail to parse.

Acceptance Criteria

  • Add PStringLengthWidth enum (or equivalent) to represent prefix widths
  • Extend TypeKind::PString with a field for the length-prefix width
  • Parser recognizes pstring/B, pstring/H, pstring/h, pstring/L, pstring/l suffixes
  • Evaluator reads 2-byte and 4-byte little-endian length prefixes
  • Bounds checking works correctly for all prefix widths
  • Unit tests for each variant
  • Update documentation

Impact

LOW -- Multi-byte pstring prefixes are uncommon in real-world magic files, but needed for full GNU libmagic compatibility.

Files to Modify

  • src/parser/ast.rs -- Add length-width enum, extend PString variant
  • src/parser/types.rs -- Parse /B, /H, /L suffixes after pstring keyword
  • src/evaluator/types/string.rs -- Extend read_pstring() for 2-byte and 4-byte prefixes
  • src/parser/codegen.rs -- Update serialization for new field

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestevaluatorRule evaluation engine and logicparserMagic file parsing components and grammarpriority:lowNice to have, can defer

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions