Skip to content

0x355/ida-stackstrings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ida-stackstrings

A stack string detection and reconstruction plugin for IDA Pro 8.x and 9.x.

ida-stackstrings walks every function in your database, replays the constant writes that compilers and packers use to materialise short strings on the stack, and reports the reconstructed text right inside your listing. It is small, dependency free (pure IDAPython) and designed to integrate into a malware analysis workflow rather than sit in a sandbox project.

[ida-stackstrings] [+] scanning function sub_140007280
[ida-stackstrings] [+] stack string @ 0x14000729f -> 'cmd.exe' (score=0.87)
[ida-stackstrings] [+] utf16 stack string @ 0x140007631 -> 'powershell.exe' (score=0.90)
[ida-stackstrings] [+] stack string @ 0x140007abc -> 'Secret!' (score=0.62)
[ida-stackstrings] [+] applied 5 comment(s), renamed 4 stkvar(s)

Why stackstrings matter

A stack string is a string literal that does not exist in the binary's .rdata / .data segment. Instead of being stored as a contiguous blob, the bytes are written into a local stack buffer one or a handful at a time:

mov     byte ptr [rsp+20h], 63h    ; 'c'
mov     byte ptr [rsp+21h], 6Dh    ; 'm'
mov     byte ptr [rsp+22h], 64h    ; 'd'
mov     byte ptr [rsp+23h], 2Eh    ; '.'
mov     byte ptr [rsp+24h], 65h    ; 'e'
mov     byte ptr [rsp+25h], 78h    ; 'x'
mov     byte ptr [rsp+26h], 65h    ; 'e'
mov     byte ptr [rsp+27h], 0      ; '\0'

Compilers occasionally do this for very small literals, but it is much more common as a manual obfuscation trick:

  • Malware authors use stack strings to keep API names, paths and C2 indicators out of static string tools (strings, floss first pass, AV signatures over .rdata). Loaders, droppers, packers and shellcode payloads are the most prolific users.
  • CTF challenges and commercial protectors love them because they break the simplest possible RE workflow without costing more than a few extra instructions.
  • Compilers under aggressive optimisation sometimes emit a stack string when a small literal is written into a function local buffer that escapes via &buf; this is rarer but still ends up showing in real code.

The result is a binary that, opened in IDA, shows hundreds of mov [rsp+X], 0xNN instructions and zero useful strings in the cross references. This plugin replays those writes back into text.


Features

  • x86 and x86-64 support. Operand widths from byte through qword, both [rsp+disp] and [rbp+disp] addressing, and indexed [rsp+rax*scale+disp] forms (MSVC /Od and similar patterns).
  • Recognises:
    • mov [stack +/- disp], imm (byte / word / dword / qword)
    • mov [stack +/- disp], reg with a tracked constant in reg
    • xor [stack +/- disp], imm (single instruction key XORs)
    • push imm (writes one slot at the current rsp top)
  • Constant propagation through mov reg, imm, imul, movzx / movsx from a stack cell, and xor / and / or reg, imm|reg. This is what makes registerrouted stores and the in register half of a load modify store XOR loop work.
  • ASCII and UTF-16 LE reconstruction. UTF-16 detection is alignment aware.
  • Confidence scoring based on length, printable ratio, byte entropy and a hint dictionary tuned for malware indicators
  • Repeatable comments are added to the first instruction of each reconstructed string so the text shows up:
    • in the assembly listing
    • in the Hex-Rays decompiler output
    • on every cross reference back to the function
  • Optional stkvar renaming when the score crosses a configurable threshold and the slot still has an IDA generated name. Skipped on IDA 9.x where ida_struct has been removed (comments still work).
  • Sortable chooser with double click navigation for browsing results.
  • Idempotent. Running it twice does not duplicate comments or clobber names you have already edited.
  • No third party dependencies. Only idaapi, idc, idautils and a few sub modules.

Usage

  • Hotkey — press Ctrl Alt S from anywhere in IDA. A small prompt asks whether to scope the analysis to the function under the cursor or to the whole database.

  • MenuEdit -> Plugins -> Stack Strings.

  • Scripted — call into the package directly from the IDAPython console

  • Custom configuration

    from ida_stackstrings.analyzer import AnalyzerOptions, analyze_function
    from ida_stackstrings.heuristics import HeuristicConfig
    from ida_stackstrings.comments import CommentOptions, apply_results
    import idc
    
    opts = AnalyzerOptions(
        heuristics=HeuristicConfig(min_length=5, min_score=0.55),
        follow_xor=True,
        follow_push=True,
        verbose=True,
    )
    result = analyze_function(idc.get_screen_ea(), opts)
    apply_results(result.strings, CommentOptions(rename_stkvars=False))

After a run, the chooser opens with one row per reconstructed string. Columns are sortable; double-click jumps to the EA.


Example output

Running the plugin on a small demo binary that mixes ASCII byte writes, UTF-16 register-routed stores and a per byte XOR loop:

Address    Function          Encoding   Len   Score   Offset      String
0x1400074ac  sub_140007490    ascii      12    1.00    sp+0x40    kernel32.dll
0x1400077e2  sub_1400077C0    ascii      24    1.00    sp+0x60    http://c2.example/beacon
0x140007631  sub_140007610    utf16le    14    0.90    sp+0x30    powershell.exe
0x14000729f  sub_140007280    ascii       7    0.87    sp+0x68    cmd.exe
0x140007abc  sub_140007AA0    ascii       7    0.62    sp+0x28    Secret!

After the run, the disassembly shows:

.text:0000000140001020   mov    byte ptr [rsp+20h], 'c'    ; [stackstring] "cmd.exe"
.text:0000000140001025   mov    byte ptr [rsp+21h], 'm'
...

and (on IDA 8.x, where stkvar renaming is available) the renamed local variables (ss_cmd_exe, ss_kernel32_dll, …) appear in the Hex-Rays decompiler output.


Implementation notes

How the analyzer works

For each function:

  1. We iterate every instruction with idautils.FuncItems and decode it through ida_ua.decode_insn.

  2. We recognise three families of stack-store shape:

    • mov dst, imm where dst is [stack_base (+ index * scale) + disp]
    • mov dst, reg where reg carries a tracked constant
    • xor dst, imm where dst is on the stack
    • push imm

    Every other instruction is either passed to the register tracker (so subsequent stack writes can resolve their operands) or ignored — the analyzer is not a full emulator.

  3. Each recognised instruction becomes a StackWrite record that carries the normalised base register ("sp" or "bp"), the signed displacement, the bytes deposited, the operation, and the instruction's position in program order.

Register tracker

A small constant propagator runs alongside the instruction walker. It models mov reg, imm, imul (2- and 3-operand), movzx / movsx from memory or a register, and the in-register xor / and / or. It is intentionally minimal: any unrecognised write to a register drops that register from the tracker, so a stale value can never leak into a stack-write displacement.

Byte-register IDs are canonicalised: in IDA 9.x's metapc, al/cl/ dl/bl/spl/bpl/sil/dil are encoded as 16-23 (REX form) while the full-width registers are 0-7. The tracker collapses 16-23 onto 0-7 so a value loaded into eax is still visible when later code reads al.

How the decoder works

Writes are replayed against a sparse dict[(base, offset)] -> byte map in instruction order:

  • mov / push overwrite the cell.
  • xor XORs the cell with the immediate (uninitialised cells are treated as zero — the same behaviour as the CPU after a zeroing prologue).

The map is then split into maximal contiguous ranges per base, each range is split on \x00 terminators, and each segment is fed into:

  1. An ASCII pass. The segment is trimmed of non-printable leading and trailing bytes, then run through the printable filter and the scoring function.
  2. A UTF-16 LE pass. Both starting parities are tried because UTF-16 buffers occasionally sit immediately after an ASCII string in the same locals area, on an odd offset.

UTF-16 heuristics

A buffer is considered UTF-16 when:

  • its length is even (after trimming trailing single bytes),
  • every odd byte is 0x00,
  • at least 85% of the even bytes are printable ASCII.

This deliberately ignores BMP characters past 0x7F. Those would need real Unicode tables, and stack-string UTF-16 in real malware is almost always plain ASCII smuggled into a wide buffer.

XOR decoding

ida-stackstrings handles two XOR shapes:

; (a) constant-key XOR against a stack cell
mov  [rsp+X], imm1
xor  [rsp+X], imm2
; (b) load-modify-store XOR via a register
movzx  eax, byte ptr [rsp+X]
xor    eax, K
mov    [rsp+X], al

Both are replayed against the same byte-level stack shadow, so a per-byte XOR loop with a constant key reconstructs to the plaintext once every cell has been visited.

Confidence score

The score is a value in [0.0, 1.0] that combines:

Factor Weight
Length term up to 1.0
Printable ratio +0.0 .. 0.2
Byte-dominance penalty -0.0 .. 0.4
Hint-keyword bonus +0.0 / 0.25
Length ≥ 8 bonus +0.05

The default cut-off is min_score = 0.45, which in practice keeps out runs of repeating bytes (AAAAAA) and three-letter false positives while still surfacing genuine 4-character names.


Limitations

The analyzer is intentionally narrow. Things it does not handle yet:

  • Table-driven decoders (sub, add, multi-byte rotations, custom shuffles).
  • XOR loops with a register-resident key that itself depends on loop state (rolling keys, key derived from a counter).
  • Strings reconstructed across function boundaries (memcpy from a wrapper that builds the string in another frame).
  • Heap-allocated buffers. This is by design — the project name says stack strings.
  • Stkvar renaming on IDA 9.x: ida_struct was removed and the replacement API (ida_typeinf) is not yet wired up. Repeatable comments still work; only the rename step is skipped.

The plugin also assumes IDA's stack-pointer tracking (idc.get_spd) is correct for the function being analyzed. If IDA fails to compute frame info (very rare on x86/x64 binaries), the push imm recogniser will skip those instructions rather than guess.


Roadmap

In rough priority order:

  1. Stkvar renaming on IDA 9.x via ida_typeinf to restore the feature that ida_struct's removal disabled.
  2. Loop XOR / arithmetic decoders. Detect the canonical for (i = 0; i < n; ++i) buf[i] op= key shape for sub / add and apply the operation symbolically.
  3. Symbolic reconstruction for buffers built via small helper functions (_decode_string("…")).
  4. Emulation-assisted decoding as an opt-in mode using unicorn-engine, for binaries where the instruction-level recogniser is no longer enough.
  5. Export of results to JSON / CSV for diffing across samples.
  6. Deobfuscation helpers — automatic patching of the contributing instruction range with a db directive once the analyst confirms the decoded string.

Contributing

Bug reports and pull requests are welcome. When opening an issue please include:

  • IDA Pro version (Help -> About).
  • Architecture (x86 / x64) and bitness of the binary.
  • A minimal reproducer when possible — a self-contained C snippet compiled with the relevant /Od or optnone flags is the easiest way to get a fix landed.

The codebase is intentionally modular — adding a new instruction shape almost always means a new branch inside _track_reg_writes or _decode_mov_or_xor in analyzer.py.


License

See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages