Work in Progress — yarforge is actively under development. Expect missing features, rough edges, and breaking changes.
A command-line tool that automatically generates YARA signature rules from PE (Portable Executable) files by extracting meaningful information and filtering out noise.
yarforge parses a PE binary, extracts candidate ASCII & Unicode strings, filters out low-value strings, and writes a ready-to-use .yar rule file into a signatures/ directory.
- Load PE — The target binary is loaded via
pe-lib. - Extract Strings — Strings of length ≥ 8 are pulled from the image.
- Filter — Strings are compared against several layers:
- Prefix-based exclusion (section names, MSVC-mangled names, common runtime DLL names)
- Import table cross-reference — any string matching a known imported function name is dropped
- A large static blocklist covering MSVC runtime error messages, CRT/heap diagnostics and more...
- PDB Prompt — If a
.pdbpath string is found, the user is asked interactively whether to include it. - Serialize — The collected strings and metadata are written out as a
.yarrule file.
Generated rules follow this structure:
rule target_binary {
meta:
description = "yarforge generated this!"
author = "crim"
reference = "https://github.com/NtProtectVirtualMemory/yarforge"
date = "2025-01-01"
hash = ""
strings:
$s0 = "SomeString" fullword ascii
$s1 = "AnotherString" fullword wide
condition:
true
}Rule names are derived from the input filename, sanitized to be valid YARA identifiers (lowercase alphanumeric, underscores, no leading digit, no consecutive underscores).
yarforge is still in early development. The following areas are incomplete or placeholder:
conditionblock is alwaystrue— No real condition logic is generated yet. Future versions will build meaningful conditions (e.g.any of them, PE header checks, minimum string matches).- No confidence scoring — All surviving strings are included equally with no ranking or duplicate removal.
- No import-based condition hints — Import table data is used for filtering only; it could also inform condition generation (e.g. detecting specific API patterns).
This project is licensed under the MIT License - see the LICENSE file for details.
Generated rules are a starting point — always review and refine before operational use.