,
╓╗╗, ,╓▄▄▄Φ▓▓██▌╫D
║▌ `▓L ,,, ╓▄▄▄Φ▓▓▀▀▀╫╫╫╫╫╫╫▀▀╫▓▓▄
▓▄▓▓▓ ,▄▄B░▀╫Ñ╬░░╫╫▓▓▓▓╫╫╫╫▓▓▓╫╫╫╫╣▓▓▓▄
║████L ,╓#▀▀▀╨╫ÑÑ╦▄▒▀╣▓▄▄▀╣▌╫▀ ██╫╫╫╫▓▓╫▓▓φ
▓╫╫╫▀]Ñ░░░░ÑÑÑÑ░░░░░╠▀W▄╠▀▓▒░╫Ñ╖ ╙└"╜▀▓▓▓▓▓█▓▓
║░░░╦╬╫╫╫╫╫╫╫╫╫╫╫╫╫ÑÑ░░░╠Ñ░╨╫Ñ░╫╫╫╫N ▀▓▓▓╫██▓╕
,]░╦╬╫╫╫╫╫╫╫▓▓▓▓▓▓╫╫╫╫╫╫╫Ñ░░╠░░╫M░╠╫╫╫╫╦, ▀▓▓▓▓▓▓⌐
╗▄╦ ]░░╬╫╫╫╫╫▓▓██████████▓▓▒╫╫╫╫Ñ░░╟▒╟▓▒ñ▓▓▓▓░N ╙▓▓▓▓▓▓
║███╫█╫ ]░░╫╫╫╫╫▓███▓▓▓▓▓▓▓▓▓▓███▓╫╫╫╫╫░░╟▒╟▓Ü╟▓▓▓▓░H ╟▓▓▓▓▓L
║███╫█╫ ]░░╫╫╫╫▓██▓╫▓▓▓▀▀╠╠╬▀▓▓▓╫▓██▓╫╫╫╫░░ÑÑ╠▄░╠▓▓▓▄▄▄▄▄▓▓▓╫╫╫╫
╓▄▄╫█╫╖╖╖╦░╫╫╫╫╫██▓▓▓▓▀░╬Ñ╣╬╫Ñ░╟▓▓▓▓██╫╫╫╫Ñ░╦]░░░║████▀▀╫╫╫▓╩╨╟╫
╟▓▓╫█╫▀▀▀╩╬╩╫╫▓██▓▓▓▓▌░╫░╟▓▓K╫Ñ░▓▓▓▓╫██▓▒╩╩╩╩ ╙╩╨▀▓M╨╩╨╙╝╣N╦╗Φ╝
╫█╫ ▀███▀╣▓▓▓▓▓░╫Ñ░╠▀░╫Ü░▓▓▓▓▓▀▀███╕ ▐▓▌╖
▄▄▄▄▓█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄╛
▀╩╫╫╫╠╣▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▀░╫╫╫╫▌
╗▄╫╫Ñ░╠▀▓▓▓▓▓▓▓▓▓▓▓▓▀░╦╬╫╫∩
`⌠╫╫╫Ñ░░Å╣▀▀▀▀▀▒░╦╬╫╫╫`█
╙╙""╫╫╫½╫╫╫╬╫╫╫╫╫M"▓╛
└╙└ ▄▓╩`║▓╩ Å▀
Cut through MITRE ATT&CK framework and extract relevant identifiers for searching and hunting.
At its core, MITRESaw creates a CSV-formatted version of the MITRE ATT&CK Framework and outputs individual Threat Actor ATT&CK Navigator JSON files, depending on keywords provided.
MITRESaw has evolved to also produce search queries based on extracted indicators (aligned with Threat Group TTPs). Searches currently provided are compatible with Splunk, Azure Sentinel and Elastic/Kibana. SIGMA will be included soon.
python3 -m pip install -r requirements.txt
./MITRESaw.py [options]
All arguments are optional named flags with sensible defaults. To display usage, simply run: ./MITRESaw.py -h
usage: MITRESaw.py [-h] [-f FRAMEWORK] [-p PLATFORMS] [-s STRINGS]
[-g THREATGROUPS] [-a] [-n] [-o] [-Q] [-q] [-t]
[-c COLUMNS] [-D] [-x {csv,json,xml}] [-E] [-C]
[-w MAX_WORKERS] [-A] [-I [DIR]]
[-rS] [-rN] [--clear-cache] [-F]
options:
-h, --help show this help message and exit
-f, --framework FRAMEWORK Specify which framework - Enterprise, ICS or Mobile (default: all three)
-p, --platforms PLATFORMS Filter by platform e.g. Windows,Linux,IaaS (default: . for all)
-s, --strings TERMS Filter by industry e.g. mining,technology,defense (default: . for all)
-g, --threatgroups GROUPS Filter by group e.g. APT29,HAFNIUM,Turla (default: . for all)
-a, --asciiart Show ASCII Art of the saw
-n, --navlayers Obtain ATT&CK Navigator layers for identified Groups
-o, --showotherlogsources Show log sources with less than 1% coverage
-Q, --queries Build search queries for Splunk, Azure Sentinel, Elastic/Kibana
-q, --quiet Suppress per-identifier output; print only group completion
-t, --truncate Truncate indicator output (still written to file)
-c, --columns COLUMNS Export filtered CSV with specified columns (comma-separated)
-D, --default Export key procedure columns to mitre_procedures.csv
-I, --import-citations Import manually saved PDF/HTML citations (default: data/citations/)
-x, --export {csv,json,xml} Export format for output files (default: csv)
-E, --evidence-report Generate styled XLSX evidence report (one row per indicator)
-C, --citations Collect citation sources with multi-method fallback (requires -E)
-w, --max-workers N Max parallel threads for fetching (1-50, default: 50)
-A, --auto Skip the pre-run ETA confirmation prompt
-rS, --retry-stix Retry citations that fell back to STIX metadata
-rN, --retry-nocontent Retry citations that had no content at all
--clear-cache Clear the entire citation cache before running
-F, --fetch Force fresh download of ATT&CK STIX data
The -D (default) flag is the catch-all option. It extracts all groups across all platforms with the key procedure columns and produces a clean CSV ready for SIEM ingestion. Combine with -E to also get the styled XLSX evidence report:
./MITRESaw.py -D -E -CThis gives you everything you need to get started: mitre_procedures.csv for lookups, mitre_procedures.xlsx for analysis, and citation source content from blog posts, vendor reports, and PDFs. Add -q for quieter output, or layer on -g, -p, -t filters to narrow scope.
# Default export with all groups (fastest way to get results)
./MITRESaw.py -D
# Quiet mode - show group completion instead of every indicator
./MITRESaw.py -D -q
# Filter by platform and threat group
./MITRESaw.py -p Windows -g APT29
# Export as JSON
./MITRESaw.py -g APT29 -x json
# Build search queries for specific groups on Windows/Linux
./MITRESaw.py -p Windows,Linux -t mining,technology,defense -Q
# Export filtered columns with industry keyword tagging
./MITRESaw.py -c group_sw_name,technique_id,technique_name,keywords
# Evidence report with SIEM queries
./MITRESaw.py -g APT29,APT33,OilRig -p Windows -E -Q
# Evidence report with citation collection
./MITRESaw.py -g APT29 -p Windows -E -C
# Retry only stix_metadata failures (keeps successful cache)
./MITRESaw.py -rS -D -E -C
# Retry only no-content failures
./MITRESaw.py -rN -D -E -C
# Retry both stix_metadata and no-content failures
./MITRESaw.py -rS -rN -D -E -C
# Nuclear option: clear entire cache and re-fetch everything
./MITRESaw.py --clear-cache -D -E -C
# Force refresh STIX data and clear citation cache
./MITRESaw.py -D -E -C --clear-cache -FValid column names for --columns:
group_sw_id, group_sw_name, group_sw_description, technique_id,
technique_name, technique_description, tactic, platforms, framework,
procedure_example, evidence, detectable_via, keywords
When -E is used, MITRESaw produces:
Outputs written to: data/2026-03-28/Windows__APT29/
🏛️ mitre_procedures.csv
📎 mitre_procedures.xlsx
🍠 citations_failed.yaml
When no group/platform/term filters are provided, files are placed in the date root directory (e.g. data/2026-03-28/).
mitre_procedures.csv — One row per group+technique pair. Suitable for direct ingestion as a lookup table into Splunk (| inputlookup), Microsoft Defender for Endpoint, Elastic, or any SIEM. Fields are properly quoted per RFC 4180. Columns: group_sw_id, group_sw_name, group_sw_description, technique_id, technique_name, technique_description, tactic, platforms, framework, procedure_example, evidence, detectable_via.
mitre_procedures.xlsx — Styled evidence report with multiple sheets:
| Sheet | Description |
|---|---|
| Evidence Report | One row per atomic indicator with 14 columns (see schema below) |
| Group Summary | Per-group stats: technique count, indicator count, tactic coverage, invocation coverage |
| Tactic Pivot | Indicators per tactic, sorted by count, with example technique IDs |
| Technique Matrix | Intersection matrix (only when 2+ groups): techniques as rows, groups as columns, 1 where a group uses that technique, sorted by group coverage descending for prioritising hunting |
| Reference Detail | Citation sources with extracted content, collection method, and URL (only with -C) |
citations_failed.yaml — List of citations that fell back to STIX metadata (URL fetch failed across all methods). Includes the full attempt chain for diagnostics. Only generated with -C.
The --evidence-report / -E flag generates a styled XLSX evidence report (mitre_procedures.xlsx) with one row per atomic indicator extracted from MITRE ATT&CK procedure examples, plus a companion mitre_procedures.csv for SIEM ingestion.
| # | Column | Description |
|---|---|---|
| 1 | Evidential Element | The atomic indicator (command, registry key, CVE, port, path, software, event ID) |
| 2 | Threat Group | Canonical group name |
| 3 | Procedure Example | MITRE ATT&CK procedure text (cleaned: markdown links shown as Name (ID), citations removed) |
| 4 | Technique ID | ATT&CK technique ID (e.g. T1059.001) |
| 5 | Technique Name | ATT&CK technique name |
| 6 | Tactic | ATT&CK tactic |
| 7 | Platforms | Target platforms (e.g. Windows, Linux, macOS) |
| 8 | Framework | ATT&CK framework (Enterprise, ICS, Mobile) |
| 9 | MITRE Invocations | Invocation strings extracted from procedure text — backtick-wrapped commands, CLI flags, registry paths, file paths as MITRE documented them |
| 10 | Detection Guidance | Detection context per indicator type (Sysmon EIDs, detection methods) |
| 11 | Log Sources | MITRESaw-mapped log sources (e.g. Sysmon: 1, Security EventLog: 4688, AppLocker EventLog, netflow, PCAP, *nix /var/log) |
| 12 | Reference URL | URL from procedure text or constructed ATT&CK technique URL |
| 13 | Navigation Layer URL | ATT&CK Navigator JSON layer URL for the group |
| 14 | Source Type | Website or GitHub | Website |
When 2+ groups are provided (e.g. -g APT29,APT33,OilRig), a Technique Matrix sheet is added showing which techniques are shared across groups. Techniques are sorted by the number of groups that use them (descending), helping prioritise which TTPs to hunt for first — techniques used by all targeted groups offer the highest detection ROI.
The -C / --citations flag collects ALL citation source material for each technique — blog posts, vendor reports, government advisories, PDFs, and more. Citations are collected inline during extraction and displayed per technique, with indicators extracted from the fetched content.
For each (Citation: X) found in procedure text, technique descriptions, and detection guidance, the collector tries multiple methods in order until content is obtained:
| Method | Description | Status Icon |
|---|---|---|
| direct | Standard HTTP fetch with browser-like headers | ✅ |
| headless | Playwright Chromium for Cloudflare/JS-protected sites | ✅ |
| wayback | Wayback Machine (web.archive.org) archived snapshot | ✅ |
| google_cache | Google's cached version of the page | ✅ |
| pdf:PyPDF2 | PDF downloaded and text extracted | ✅ |
| cached | Previously fetched, loaded from .citation_cache/ |
✅ |
| stix_metadata | STIX description field only (author, title, date) |
Known migrated URLs are automatically rewritten:
www.mandiant.com/resources/...→cloud.google.com/blog/topics/threat-intelligence/...www.fireeye.com/blog/...→cloud.google.com/blog/topics/threat-intelligence/...
Homepages and documentation sites are automatically skipped (7-zip, WinRAR, Wikipedia, Microsoft docs, Cisco product docs, etc.) — these have no threat intelligence value.
When a citation page is successfully fetched, MITRESaw runs its extraction patterns against the content to find additional indicators not present in the MITRE procedure text. The same patterns used for native extraction are applied:
| Emoji | Type | What's extracted |
|---|---|---|
| 💻 | cmd |
Commands, CLI invocations, backtick-quoted strings |
| 🔑 | reg |
Windows registry paths |
| 🔒 | cve |
CVE identifiers |
| 📁 | paths |
Windows and Unix file/directory paths |
| 📦 | software |
Executables, DLLs, tools |
| 🌐 | ports |
Network port numbers |
Only new indicators are shown — anything already extracted by MITRESaw's native pipeline is deduplicated. This means techniques that had no native indicators (e.g. T1621 MFA Request Generation) can still gain indicators from their citation sources.
Citation-extracted indicators are:
- Displayed in the terminal under each citation with emojis
- Injected as native evidence rows into
mitre_procedures.csvandmitre_procedures.xlsx - Atomised in the evidence report (one row per indicator, same as native indicators)
- Included in the Technique Matrix — techniques gain group coverage from citation indicators
The procedure example column for these rows shows "Indicators extracted from citation: <name> (<url>)" to distinguish them from MITRE-sourced indicators.
For sites that block automated access, save the page as PDF or HTML from your browser and import it:
# Save blocked pages into data/citations/
# e.g. securelist.com_apt-report.pdf, unit42_medusa.html
# Import and run
./MITRESaw.py -I -D -E -C
# Or specify a different directory
./MITRESaw.py -I /path/to/saved/pages -D -E -CSupported formats: .pdf, .html/.htm, .txt. Imported files are cached and used on all future runs.
| Icon | Meaning |
|---|---|
| ✅ | Content freshly fetched from source |
| 💾 | Content loaded from local cache |
| STIX metadata only (author/title/date — fetch failed) | |
| ❌ | No content at all |
Fetched pages are cached locally to avoid re-downloading on subsequent runs. Failed URLs are also cached within the same run to avoid re-trying the same broken URL across multiple procedures — this is the single biggest performance optimisation (see below).
| Flag | What it removes | When to use |
|---|---|---|
-rS / --retry-stix |
Cache entries where fetch failed, only STIX metadata captured ( |
After fixing SSL/network issues — sites that were unreachable may now work |
-rN / --retry-nocontent |
Cache entries with completely empty text (❌) | After installing Playwright — previously unfetchable pages may now parse |
--clear-cache |
Everything | Start completely fresh — re-fetches all ~5,000+ URLs |
Use -rS and -rN together to retry all failures while keeping successful cache:
./MITRESaw.py -rS -rN -D -E -CWhen using -C, MITRESaw scans the cache before starting and shows a summary:
┌─────────────────────────────────────────────
│ Procedures: 4750
│ Citations: 17451
│ Cached: 4562
│ Uncached: 1306
│ Workers: 50
│ Estimated time: 1m 18s
└─────────────────────────────────────────────
Continue? [Y/n]
Use -A / --auto to skip the confirmation and start immediately.
Before processing procedures, all uncached citations are fetched in a single parallel batch using all available workers. This maximises parallelism — instead of fetching 1-3 citations per procedure sequentially, all uncached URLs are fetched at once. A live progress counter shows completion and ETA.
After pre-fetching, the main processing loop reads exclusively from cache and runs in seconds.
Times are approximate for a full all-groups run (~4,750 procedures, ~17,000 citations, 50 workers). Subsequent runs with a warm cache are significantly faster.
| Command | First Run | Cached Run | What It Does |
|---|---|---|---|
-D |
~2 min | ~2 min | Extract procedures to CSV |
-D -E |
~3 min | ~3 min | + styled XLSX evidence report |
-D -E -C |
~5-15 min | ~3 min | + citation collection (pre-fetch + extraction) |
-D -E -C -Q |
~5-15 min | ~4 min | + search queries (Splunk/Sentinel/Elastic) |
-D -E -C -n |
~8-18 min | ~6 min | + Navigator layer downloads |
-D -E -C -rS |
~5-15 min | ~5-15 min | + retry STIX-metadata failures |
The citation pre-fetch phase accounts for most of the first-run time. The -rS flag clears cached failures and re-fetches them, so it always takes first-run time. A 30-day cooldown warning is shown if -rS was used recently, since the same URLs will likely fail again.
Workers start at the configured maximum (default 50) and automatically adjust during execution:
- On 429 rate-limit: workers halve (e.g. 50 → 25)
- After 50 clean procedures: workers increase by 2 (e.g. 25 → 27)
- Current worker count and rate-limit count are shown in the progress bar
| Optimisation | Detail |
|---|---|
| Request timeout | 8s (direct, wayback fetch, google cache) |
| Wayback Machine API timeout | 5s |
| Per-domain rate limit | 0.5s between requests to same domain |
| Global rate limit | Disabled — per-domain delay is sufficient |
| Cached failure recognition | Empty cache entries skip the full method chain instantly |
| Pre-fetch batch | All uncached URLs fetched in one parallel batch before processing |
| Gibberish filtering | Garbled PDF content (base64, binary) rejected before indicator extraction |
| Package | Purpose | Install |
|---|---|---|
playwright |
Headless browser for JS/Cloudflare sites | pip install playwright && playwright install chromium |
Playwright is optional — the collector works without it but will skip headless browsing for Cloudflare/JS-protected sites.
Citations that fell back to stix_metadata are written to citations_failed.yaml in the output directory, with the full attempt chain for each URL.
For large runs (all groups with citation collection), MITRESaw can take a significant amount of time. Use tmux to run it in the background and reconnect later:
# Start a tmux session
tmux new -s mitresaw
# Run MITRESaw
./MITRESaw.py -D -E -C
# Detach from tmux: press Ctrl+B then D
# Re-attach anytime to see live progress
tmux attach -t mitresawAlternatively, run with nohup and monitor the log:
# Run in background
nohup ./MITRESaw.py -D -E -C -q > mitresaw.log 2>&1 &
# Check progress
tail -5 mitresaw.log
# Watch live
tail -f mitresaw.logA dual progress bar is pinned to the bottom of the terminal showing:
- Procedures — extraction progress across all group+technique pairs
- Citations — collection progress across all citation sources
- ETA — estimated time remaining
The progress bar stays in place while extraction output scrolls above it. The current worker count and rate-limit count are shown alongside the ETA.
MITRESaw supports an exclusion list to filter out known false-positive indicators. Edit data/exclusions.csv with two columns:
indicator,reason
whoami,Common benign command
ipconfig,Common benign commandExclusions are case-insensitive and apply to both native and citation-extracted indicators. Excluded indicators are silently removed from terminal output and export files.
The exclusion list can also be managed via the web interface.
MITRESaw includes a single-page web interface for running and monitoring from a browser:
pip install fastapi uvicorn sse-starlette
python mitresaw_web.py
# Open http://localhost:6729Features:
- Run configuration — checkbox flags, group/platform filters, worker count
- Live log streaming — real-time output via Server-Sent Events
- Cache statistics — total cached, success/failed counts, disk usage
- Output file browser — download CSV/XLSX results directly
- Exclusion editor — add/remove exclusions from the browser
- Stop button — cancel a running extraction
No authentication is included — intended for local use only.
Because the MITRE ATT&CK has been built and is managed in the United States, the keywords provided need to be in US English, as opposed to UK English (e.g. defense vs defence).