A stack string detection and reconstruction plugin for IDA Pro 8.x and 9.x.
ida-stackstrings walks every function in your database, replays the
constant writes that compilers and packers use to materialise short
strings on the stack, and reports the reconstructed text right inside
your listing. It is small, dependency free (pure IDAPython) and
designed to integrate into a malware analysis workflow rather than
sit in a sandbox project.
[ida-stackstrings] [+] scanning function sub_140007280
[ida-stackstrings] [+] stack string @ 0x14000729f -> 'cmd.exe' (score=0.87)
[ida-stackstrings] [+] utf16 stack string @ 0x140007631 -> 'powershell.exe' (score=0.90)
[ida-stackstrings] [+] stack string @ 0x140007abc -> 'Secret!' (score=0.62)
[ida-stackstrings] [+] applied 5 comment(s), renamed 4 stkvar(s)
A stack string is a string literal that does not exist in the
binary's .rdata / .data segment. Instead of being stored as a
contiguous blob, the bytes are written into a local stack buffer one
or a handful at a time:
mov byte ptr [rsp+20h], 63h ; 'c'
mov byte ptr [rsp+21h], 6Dh ; 'm'
mov byte ptr [rsp+22h], 64h ; 'd'
mov byte ptr [rsp+23h], 2Eh ; '.'
mov byte ptr [rsp+24h], 65h ; 'e'
mov byte ptr [rsp+25h], 78h ; 'x'
mov byte ptr [rsp+26h], 65h ; 'e'
mov byte ptr [rsp+27h], 0 ; '\0'Compilers occasionally do this for very small literals, but it is much more common as a manual obfuscation trick:
- Malware authors use stack strings to keep API names, paths and
C2 indicators out of static string tools (
strings,flossfirst pass, AV signatures over.rdata). Loaders, droppers, packers and shellcode payloads are the most prolific users. - CTF challenges and commercial protectors love them because they break the simplest possible RE workflow without costing more than a few extra instructions.
- Compilers under aggressive optimisation sometimes emit a
stack string when a small literal is written into a function local
buffer that escapes via
&buf; this is rarer but still ends up showing in real code.
The result is a binary that, opened in IDA, shows hundreds of
mov [rsp+X], 0xNN instructions and zero useful strings in the cross
references. This plugin replays those writes back into text.
- x86 and x86-64 support. Operand widths from
bytethroughqword, both[rsp+disp]and[rbp+disp]addressing, and indexed[rsp+rax*scale+disp]forms (MSVC/Odand similar patterns). - Recognises:
mov [stack +/- disp], imm(byte / word / dword / qword)mov [stack +/- disp], regwith a tracked constant inregxor [stack +/- disp], imm(single instruction key XORs)push imm(writes one slot at the current rsp top)
- Constant propagation through
mov reg, imm,imul,movzx/movsxfrom a stack cell, andxor/and/or reg, imm|reg. This is what makes registerrouted stores and the in register half of a load modify store XOR loop work. - ASCII and UTF-16 LE reconstruction. UTF-16 detection is alignment aware.
- Confidence scoring based on length, printable ratio, byte entropy and a hint dictionary tuned for malware indicators
- Repeatable comments are added to the first instruction of each
reconstructed string so the text shows up:
- in the assembly listing
- in the Hex-Rays decompiler output
- on every cross reference back to the function
- Optional stkvar renaming when the score crosses a configurable
threshold and the slot still has an IDA generated name. Skipped on
IDA 9.x where
ida_structhas been removed (comments still work). - Sortable chooser with double click navigation for browsing results.
- Idempotent. Running it twice does not duplicate comments or clobber names you have already edited.
- No third party dependencies. Only
idaapi,idc,idautilsand a few sub modules.
-
Hotkey — press
Ctrl Alt Sfrom anywhere in IDA. A small prompt asks whether to scope the analysis to the function under the cursor or to the whole database. -
Menu —
Edit -> Plugins -> Stack Strings. -
Scripted — call into the package directly from the IDAPython console
-
Custom configuration —
from ida_stackstrings.analyzer import AnalyzerOptions, analyze_function from ida_stackstrings.heuristics import HeuristicConfig from ida_stackstrings.comments import CommentOptions, apply_results import idc opts = AnalyzerOptions( heuristics=HeuristicConfig(min_length=5, min_score=0.55), follow_xor=True, follow_push=True, verbose=True, ) result = analyze_function(idc.get_screen_ea(), opts) apply_results(result.strings, CommentOptions(rename_stkvars=False))
After a run, the chooser opens with one row per reconstructed string. Columns are sortable; double-click jumps to the EA.
Running the plugin on a small demo binary that mixes ASCII byte writes, UTF-16 register-routed stores and a per byte XOR loop:
Address Function Encoding Len Score Offset String
0x1400074ac sub_140007490 ascii 12 1.00 sp+0x40 kernel32.dll
0x1400077e2 sub_1400077C0 ascii 24 1.00 sp+0x60 http://c2.example/beacon
0x140007631 sub_140007610 utf16le 14 0.90 sp+0x30 powershell.exe
0x14000729f sub_140007280 ascii 7 0.87 sp+0x68 cmd.exe
0x140007abc sub_140007AA0 ascii 7 0.62 sp+0x28 Secret!
After the run, the disassembly shows:
.text:0000000140001020 mov byte ptr [rsp+20h], 'c' ; [stackstring] "cmd.exe"
.text:0000000140001025 mov byte ptr [rsp+21h], 'm'
...
and (on IDA 8.x, where stkvar renaming is available) the renamed
local variables (ss_cmd_exe, ss_kernel32_dll, …) appear in the
Hex-Rays decompiler output.
For each function:
-
We iterate every instruction with
idautils.FuncItemsand decode it throughida_ua.decode_insn. -
We recognise three families of stack-store shape:
mov dst, immwheredstis[stack_base (+ index * scale) + disp]mov dst, regwhereregcarries a tracked constantxor dst, immwheredstis on the stackpush imm
Every other instruction is either passed to the register tracker (so subsequent stack writes can resolve their operands) or ignored — the analyzer is not a full emulator.
-
Each recognised instruction becomes a
StackWriterecord that carries the normalised base register ("sp"or"bp"), the signed displacement, the bytes deposited, the operation, and the instruction's position in program order.
A small constant propagator runs alongside the instruction walker.
It models mov reg, imm, imul (2- and 3-operand), movzx /
movsx from memory or a register, and the in-register xor / and
/ or. It is intentionally minimal: any unrecognised write to a
register drops that register from the tracker, so a stale value can
never leak into a stack-write displacement.
Byte-register IDs are canonicalised: in IDA 9.x's metapc, al/cl/ dl/bl/spl/bpl/sil/dil are encoded as 16-23 (REX form) while the
full-width registers are 0-7. The tracker collapses 16-23 onto 0-7
so a value loaded into eax is still visible when later code reads
al.
Writes are replayed against a sparse dict[(base, offset)] -> byte
map in instruction order:
mov/pushoverwrite the cell.xorXORs the cell with the immediate (uninitialised cells are treated as zero — the same behaviour as the CPU after a zeroing prologue).
The map is then split into maximal contiguous ranges per base, each
range is split on \x00 terminators, and each segment is fed into:
- An ASCII pass. The segment is trimmed of non-printable leading and trailing bytes, then run through the printable filter and the scoring function.
- A UTF-16 LE pass. Both starting parities are tried because UTF-16 buffers occasionally sit immediately after an ASCII string in the same locals area, on an odd offset.
A buffer is considered UTF-16 when:
- its length is even (after trimming trailing single bytes),
- every odd byte is
0x00, - at least 85% of the even bytes are printable ASCII.
This deliberately ignores BMP characters past 0x7F. Those would
need real Unicode tables, and stack-string UTF-16 in real malware is
almost always plain ASCII smuggled into a wide buffer.
ida-stackstrings handles two XOR shapes:
; (a) constant-key XOR against a stack cell
mov [rsp+X], imm1
xor [rsp+X], imm2; (b) load-modify-store XOR via a register
movzx eax, byte ptr [rsp+X]
xor eax, K
mov [rsp+X], alBoth are replayed against the same byte-level stack shadow, so a per-byte XOR loop with a constant key reconstructs to the plaintext once every cell has been visited.
The score is a value in [0.0, 1.0] that combines:
| Factor | Weight |
|---|---|
| Length term | up to 1.0 |
| Printable ratio | +0.0 .. 0.2 |
| Byte-dominance penalty | -0.0 .. 0.4 |
| Hint-keyword bonus | +0.0 / 0.25 |
| Length ≥ 8 bonus | +0.05 |
The default cut-off is min_score = 0.45, which in practice keeps
out runs of repeating bytes (AAAAAA) and three-letter false
positives while still surfacing genuine 4-character names.
The analyzer is intentionally narrow. Things it does not handle yet:
- Table-driven decoders (
sub,add, multi-byte rotations, custom shuffles). - XOR loops with a register-resident key that itself depends on loop state (rolling keys, key derived from a counter).
- Strings reconstructed across function boundaries (
memcpyfrom a wrapper that builds the string in another frame). - Heap-allocated buffers. This is by design — the project name says stack strings.
- Stkvar renaming on IDA 9.x:
ida_structwas removed and the replacement API (ida_typeinf) is not yet wired up. Repeatable comments still work; only the rename step is skipped.
The plugin also assumes IDA's stack-pointer tracking
(idc.get_spd) is correct for the function being analyzed. If IDA
fails to compute frame info (very rare on x86/x64 binaries), the
push imm recogniser will skip those instructions rather than
guess.
In rough priority order:
- Stkvar renaming on IDA 9.x via
ida_typeinfto restore the feature thatida_struct's removal disabled. - Loop XOR / arithmetic decoders. Detect the canonical
for (i = 0; i < n; ++i) buf[i] op= keyshape forsub/addand apply the operation symbolically. - Symbolic reconstruction for buffers built via small
helper functions (
_decode_string("…")). - Emulation-assisted decoding as an opt-in mode using
unicorn-engine, for binaries where the instruction-level recogniser is no longer enough. - Export of results to JSON / CSV for diffing across samples.
- Deobfuscation helpers — automatic patching of the contributing
instruction range with a
dbdirective once the analyst confirms the decoded string.
Bug reports and pull requests are welcome. When opening an issue please include:
- IDA Pro version (
Help -> About). - Architecture (x86 / x64) and bitness of the binary.
- A minimal reproducer when possible — a self-contained C snippet
compiled with the relevant
/Odoroptnoneflags is the easiest way to get a fix landed.
The codebase is intentionally modular — adding a new instruction
shape almost always means a new branch inside _track_reg_writes or
_decode_mov_or_xor in analyzer.py.
See LICENSE.