MITRESaw

                                                                         ,
                                 ╓╗╗,                          ,╓▄▄▄Φ▓▓██▌╫D
                                ║▌ `▓L            ,,, ╓▄▄▄Φ▓▓▀▀▀╫╫╫╫╫╫╫▀▀╫▓▓▄
                                 ▓▄▓▓▓        ,▄▄B░▀╫Ñ╬░░╫╫▓▓▓▓╫╫╫╫▓▓▓╫╫╫╫╣▓▓▓▄
                                 ║████L   ,╓#▀▀▀╨╫ÑÑ╦▄▒▀╣▓▄▄▀╣▌╫▀    ██╫╫╫╫▓▓╫▓▓φ
                                  ▓╫╫╫▀]Ñ░░░░ÑÑÑÑ░░░░░╠▀W▄╠▀▓▒░╫Ñ╖   ╙└"╜▀▓▓▓▓▓█▓▓
                                  ║░░░╦╬╫╫╫╫╫╫╫╫╫╫╫╫╫ÑÑ░░░╠Ñ░╨╫Ñ░╫╫╫╫N     ▀▓▓▓╫██▓╕
                                ,]░╦╬╫╫╫╫╫╫╫▓▓▓▓▓▓╫╫╫╫╫╫╫Ñ░░╠░░╫M░╠╫╫╫╫╦,    ▀▓▓▓▓▓▓⌐
                       ╗▄╦     ]░░╬╫╫╫╫╫▓▓██████████▓▓▒╫╫╫╫Ñ░░╟▒╟▓▒ñ▓▓▓▓░N    ╙▓▓▓▓▓▓
                   ║███╫█╫    ]░░╫╫╫╫╫▓███▓▓▓▓▓▓▓▓▓▓███▓╫╫╫╫╫░░╟▒╟▓Ü╟▓▓▓▓░H    ╟▓▓▓▓▓L
                   ║███╫█╫   ]░░╫╫╫╫▓██▓╫▓▓▓▀▀╠╠╬▀▓▓▓╫▓██▓╫╫╫╫░░ÑÑ╠▄░╠▓▓▓▄▄▄▄▄▓▓▓╫╫╫╫
                    ╓▄▄╫█╫╖╖╖╦░╫╫╫╫╫██▓▓▓▓▀░╬Ñ╣╬╫Ñ░╟▓▓▓▓██╫╫╫╫Ñ░╦]░░░║████▀▀╫╫╫▓╩╨╟╫
                    ╟▓▓╫█╫▀▀▀╩╬╩╫╫▓██▓▓▓▓▌░╫░╟▓▓K╫Ñ░▓▓▓▓╫██▓▒╩╩╩╩ ╙╩╨▀▓M╨╩╨╙╝╣N╦╗Φ╝
                       ╫█╫     ▀███▀╣▓▓▓▓▓░╫Ñ░╠▀░╫Ü░▓▓▓▓▓▀▀███╕      ▐▓▌╖
                   ▄▄▄▄▓█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄╛
                                ▀╩╫╫╫╠╣▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▀░╫╫╫╫▌
                                 ╗▄╫╫Ñ░╠▀▓▓▓▓▓▓▓▓▓▓▓▓▀░╦╬╫╫∩
                                   `⌠╫╫╫Ñ░░Å╣▀▀▀▀▀▒░╦╬╫╫╫`█
                                    ╙╙""╫╫╫½╫╫╫╬╫╫╫╫╫M"▓╛
                                       └╙└ ▄▓╩`║▓╩ Å▀

MITRESaw

Cut through MITRE ATT&CK framework and extract relevant identifiers for searching and hunting.

About The Project

At its core, MITRESaw creates a CSV-formatted version of the MITRE ATT&CK Framework and outputs individual Threat Actor ATT&CK Navigator JSON files, depending on keywords provided.
MITRESaw has evolved to also produce search queries based on extracted indicators (aligned with Threat Group TTPs). Searches currently provided are compatible with Splunk, Azure Sentinel and Elastic/Kibana. SIGMA will be included soon.

Installation

python3 -m pip install -r requirements.txt

Usage

./MITRESaw.py [options]

All arguments are optional named flags with sensible defaults. To display usage, simply run: ./MITRESaw.py -h

usage: MITRESaw.py [-h] [-f FRAMEWORK] [-p PLATFORMS] [-s STRINGS]
                   [-g THREATGROUPS] [-a] [-n] [-o] [-Q] [-q] [-t]
                   [-c COLUMNS] [-D] [-x {csv,json,xml}] [-E] [-C]
                   [-w MAX_WORKERS] [-A] [-I [DIR]]
                   [-rS] [-rN] [--clear-cache] [-F]

options:
  -h, --help                  show this help message and exit
  -f, --framework FRAMEWORK   Specify which framework - Enterprise, ICS or Mobile (default: all three)
  -p, --platforms PLATFORMS   Filter by platform e.g. Windows,Linux,IaaS (default: . for all)
  -s, --strings TERMS         Filter by industry e.g. mining,technology,defense (default: . for all)
  -g, --threatgroups GROUPS   Filter by group e.g. APT29,HAFNIUM,Turla (default: . for all)
  -a, --asciiart              Show ASCII Art of the saw
  -n, --navlayers             Obtain ATT&CK Navigator layers for identified Groups
  -o, --showotherlogsources   Show log sources with less than 1% coverage
  -Q, --queries               Build search queries for Splunk, Azure Sentinel, Elastic/Kibana
  -q, --quiet                 Suppress per-identifier output; print only group completion
  -t, --truncate              Truncate indicator output (still written to file)
  -c, --columns COLUMNS       Export filtered CSV with specified columns (comma-separated)
  -D, --default               Export key procedure columns to mitre_procedures.csv
  -I, --import-citations      Import manually saved PDF/HTML citations (default: data/citations/)
  -x, --export {csv,json,xml} Export format for output files (default: csv)
  -E, --evidence-report       Generate styled XLSX evidence report (one row per indicator)
  -C, --citations             Collect citation sources with multi-method fallback (requires -E)
  -w, --max-workers N         Max parallel threads for fetching (1-50, default: 50)
  -A, --auto                  Skip the pre-run ETA confirmation prompt
  -rS, --retry-stix           Retry citations that fell back to STIX metadata
  -rN, --retry-nocontent      Retry citations that had no content at all
  --clear-cache               Clear the entire citation cache before running
  -F, --fetch                 Force fresh download of ATT&CK STIX data

Quick Start — If In Doubt

The -D (default) flag is the catch-all option. It extracts all groups across all platforms with the key procedure columns and produces a clean CSV ready for SIEM ingestion. Combine with -E to also get the styled XLSX evidence report:

./MITRESaw.py -D -E -C

This gives you everything you need to get started: mitre_procedures.csv for lookups, mitre_procedures.xlsx for analysis, and citation source content from blog posts, vendor reports, and PDFs. Add -q for quieter output, or layer on -g, -p, -t filters to narrow scope.

Examples

# Default export with all groups (fastest way to get results)
./MITRESaw.py -D

# Quiet mode - show group completion instead of every indicator
./MITRESaw.py -D -q

# Filter by platform and threat group
./MITRESaw.py -p Windows -g APT29

# Export as JSON
./MITRESaw.py -g APT29 -x json

# Build search queries for specific groups on Windows/Linux
./MITRESaw.py -p Windows,Linux -t mining,technology,defense -Q

# Export filtered columns with industry keyword tagging
./MITRESaw.py -c group_sw_name,technique_id,technique_name,keywords

# Evidence report with SIEM queries
./MITRESaw.py -g APT29,APT33,OilRig -p Windows -E -Q

# Evidence report with citation collection
./MITRESaw.py -g APT29 -p Windows -E -C

# Retry only stix_metadata failures (keeps successful cache)
./MITRESaw.py -rS -D -E -C

# Retry only no-content failures
./MITRESaw.py -rN -D -E -C

# Retry both stix_metadata and no-content failures
./MITRESaw.py -rS -rN -D -E -C

# Nuclear option: clear entire cache and re-fetch everything
./MITRESaw.py --clear-cache -D -E -C

# Force refresh STIX data and clear citation cache
./MITRESaw.py -D -E -C --clear-cache -F

Valid column names for --columns:

group_sw_id, group_sw_name, group_sw_description, technique_id,
technique_name, technique_description, tactic, platforms, framework,
procedure_example, evidence, detectable_via, keywords

Output Files

When -E is used, MITRESaw produces:

Outputs written to: data/2026-03-28/Windows__APT29/
  🏛️ mitre_procedures.csv
  📎 mitre_procedures.xlsx
  🍠 citations_failed.yaml

When no group/platform/term filters are provided, files are placed in the date root directory (e.g. data/2026-03-28/).

mitre_procedures.csv — One row per group+technique pair. Suitable for direct ingestion as a lookup table into Splunk (| inputlookup), Microsoft Defender for Endpoint, Elastic, or any SIEM. Fields are properly quoted per RFC 4180. Columns: group_sw_id, group_sw_name, group_sw_description, technique_id, technique_name, technique_description, tactic, platforms, framework, procedure_example, evidence, detectable_via.

mitre_procedures.xlsx — Styled evidence report with multiple sheets:

Sheet	Description
Evidence Report	One row per atomic indicator with 14 columns (see schema below)
Group Summary	Per-group stats: technique count, indicator count, tactic coverage, invocation coverage
Tactic Pivot	Indicators per tactic, sorted by count, with example technique IDs
Technique Matrix	Intersection matrix (only when 2+ groups): techniques as rows, groups as columns, `1` where a group uses that technique, sorted by group coverage descending for prioritising hunting
Reference Detail	Citation sources with extracted content, collection method, and URL (only with `-C`)

citations_failed.yaml — List of citations that fell back to STIX metadata (URL fetch failed across all methods). Includes the full attempt chain for diagnostics. Only generated with -C.

Evidence Report (-E)

The --evidence-report / -E flag generates a styled XLSX evidence report (mitre_procedures.xlsx) with one row per atomic indicator extracted from MITRE ATT&CK procedure examples, plus a companion mitre_procedures.csv for SIEM ingestion.

14-Column Schema

#	Column	Description
1	Evidential Element	The atomic indicator (command, registry key, CVE, port, path, software, event ID)
2	Threat Group	Canonical group name
3	Procedure Example	MITRE ATT&CK procedure text (cleaned: markdown links shown as `Name (ID)`, citations removed)
4	Technique ID	ATT&CK technique ID (e.g. T1059.001)
5	Technique Name	ATT&CK technique name
6	Tactic	ATT&CK tactic
7	Platforms	Target platforms (e.g. Windows, Linux, macOS)
8	Framework	ATT&CK framework (Enterprise, ICS, Mobile)
9	MITRE Invocations	Invocation strings extracted from procedure text — backtick-wrapped commands, CLI flags, registry paths, file paths as MITRE documented them
10	Detection Guidance	Detection context per indicator type (Sysmon EIDs, detection methods)
11	Log Sources	MITRESaw-mapped log sources (e.g. `Sysmon: 1`, `Security EventLog: 4688`, `AppLocker EventLog`, `netflow`, `PCAP`, `*nix /var/log`)
12	Reference URL	URL from procedure text or constructed ATT&CK technique URL
13	Navigation Layer URL	ATT&CK Navigator JSON layer URL for the group
14	Source Type	Website or GitHub \| Website

Technique Matrix

When 2+ groups are provided (e.g. -g APT29,APT33,OilRig), a Technique Matrix sheet is added showing which techniques are shared across groups. Techniques are sorted by the number of groups that use them (descending), helping prioritise which TTPs to hunt for first — techniques used by all targeted groups offer the highest detection ROI.

Citation Collection (-C)

The -C / --citations flag collects ALL citation source material for each technique — blog posts, vendor reports, government advisories, PDFs, and more. Citations are collected inline during extraction and displayed per technique, with indicators extracted from the fetched content.

Multi-Method Fallback Chain

For each (Citation: X) found in procedure text, technique descriptions, and detection guidance, the collector tries multiple methods in order until content is obtained:

Method	Description	Status Icon
direct	Standard HTTP fetch with browser-like headers	✅
headless	Playwright Chromium for Cloudflare/JS-protected sites	✅
wayback	Wayback Machine (web.archive.org) archived snapshot	✅
google_cache	Google's cached version of the page	✅
pdf:PyPDF2	PDF downloaded and text extracted	✅
cached	Previously fetched, loaded from `.citation_cache/`	✅
stix_metadata	STIX description field only (author, title, date)	⚠️

URL Rewriting

Known migrated URLs are automatically rewritten:

www.mandiant.com/resources/... → cloud.google.com/blog/topics/threat-intelligence/...
www.fireeye.com/blog/... → cloud.google.com/blog/topics/threat-intelligence/...

Filtered Citations

Homepages and documentation sites are automatically skipped (7-zip, WinRAR, Wikipedia, Microsoft docs, Cisco product docs, etc.) — these have no threat intelligence value.

Indicator Extraction from Citations

When a citation page is successfully fetched, MITRESaw runs its extraction patterns against the content to find additional indicators not present in the MITRE procedure text. The same patterns used for native extraction are applied:

Emoji	Type	What's extracted
💻	`cmd`	Commands, CLI invocations, backtick-quoted strings
🔑	`reg`	Windows registry paths
🔒	`cve`	CVE identifiers
📁	`paths`	Windows and Unix file/directory paths
📦	`software`	Executables, DLLs, tools
🌐	`ports`	Network port numbers

Only new indicators are shown — anything already extracted by MITRESaw's native pipeline is deduplicated. This means techniques that had no native indicators (e.g. T1621 MFA Request Generation) can still gain indicators from their citation sources.

Citation-extracted indicators are:

Displayed in the terminal under each citation with emojis
Injected as native evidence rows into mitre_procedures.csv and mitre_procedures.xlsx
Atomised in the evidence report (one row per indicator, same as native indicators)
Included in the Technique Matrix — techniques gain group coverage from citation indicators

The procedure example column for these rows shows "Indicators extracted from citation: <name> (<url>)" to distinguish them from MITRE-sourced indicators.

Manual Import (`-I`)

For sites that block automated access, save the page as PDF or HTML from your browser and import it:

# Save blocked pages into data/citations/
# e.g. securelist.com_apt-report.pdf, unit42_medusa.html

# Import and run
./MITRESaw.py -I -D -E -C

# Or specify a different directory
./MITRESaw.py -I /path/to/saved/pages -D -E -C

Supported formats: .pdf, .html/.htm, .txt. Imported files are cached and used on all future runs.

Status Icons

Icon	Meaning
✅	Content freshly fetched from source
💾	Content loaded from local cache
⚠️	STIX metadata only (author/title/date — fetch failed)
❌	No content at all

Cache

Fetched pages are cached locally to avoid re-downloading on subsequent runs. Failed URLs are also cached within the same run to avoid re-trying the same broken URL across multiple procedures — this is the single biggest performance optimisation (see below).

Retry Options

Flag	What it removes	When to use
`-rS` / `--retry-stix`	Cache entries where fetch failed, only STIX metadata captured (⚠️)	After fixing SSL/network issues — sites that were unreachable may now work
`-rN` / `--retry-nocontent`	Cache entries with completely empty text (❌)	After installing Playwright — previously unfetchable pages may now parse
`--clear-cache`	Everything	Start completely fresh — re-fetches all ~5,000+ URLs

Use -rS and -rN together to retry all failures while keeping successful cache:

./MITRESaw.py -rS -rN -D -E -C

Pre-Run ETA Estimate

When using -C, MITRESaw scans the cache before starting and shows a summary:

    ┌─────────────────────────────────────────────
    │  Procedures:         4750
    │  Citations:         17451
    │  Cached:            4562
    │  Uncached:          1306
    │  Workers:              50
    │  Estimated time:   1m 18s
    └─────────────────────────────────────────────

    Continue? [Y/n]

Use -A / --auto to skip the confirmation and start immediately.

Pre-Fetch Phase

Before processing procedures, all uncached citations are fetched in a single parallel batch using all available workers. This maximises parallelism — instead of fetching 1-3 citations per procedure sequentially, all uncached URLs are fetched at once. A live progress counter shows completion and ETA.

After pre-fetching, the main processing loop reads exclusively from cache and runs in seconds.

Estimated Run Times

Times are approximate for a full all-groups run (~4,750 procedures, ~17,000 citations, 50 workers). Subsequent runs with a warm cache are significantly faster.

Command	First Run	Cached Run	What It Does
`-D`	~2 min	~2 min	Extract procedures to CSV
`-D -E`	~3 min	~3 min	+ styled XLSX evidence report
`-D -E -C`	~5-15 min	~3 min	+ citation collection (pre-fetch + extraction)
`-D -E -C -Q`	~5-15 min	~4 min	+ search queries (Splunk/Sentinel/Elastic)
`-D -E -C -n`	~8-18 min	~6 min	+ Navigator layer downloads
`-D -E -C -rS`	~5-15 min	~5-15 min	+ retry STIX-metadata failures

The citation pre-fetch phase accounts for most of the first-run time. The -rS flag clears cached failures and re-fetches them, so it always takes first-run time. A 30-day cooldown warning is shown if -rS was used recently, since the same URLs will likely fail again.

Adaptive Worker Throttling

Workers start at the configured maximum (default 50) and automatically adjust during execution:

On 429 rate-limit: workers halve (e.g. 50 → 25)
After 50 clean procedures: workers increase by 2 (e.g. 25 → 27)
Current worker count and rate-limit count are shown in the progress bar

Performance

Optimisation	Detail
Request timeout	8s (direct, wayback fetch, google cache)
Wayback Machine API timeout	5s
Per-domain rate limit	0.5s between requests to same domain
Global rate limit	Disabled — per-domain delay is sufficient
Cached failure recognition	Empty cache entries skip the full method chain instantly
Pre-fetch batch	All uncached URLs fetched in one parallel batch before processing
Gibberish filtering	Garbled PDF content (base64, binary) rejected before indicator extraction

Optional Dependencies

Package	Purpose	Install
`playwright`	Headless browser for JS/Cloudflare sites	`pip install playwright && playwright install chromium`

Playwright is optional — the collector works without it but will skip headless browsing for Cloudflare/JS-protected sites.

Failed Citations Report

Citations that fell back to stix_metadata are written to citations_failed.yaml in the output directory, with the full attempt chain for each URL.

Running in the Background

For large runs (all groups with citation collection), MITRESaw can take a significant amount of time. Use tmux to run it in the background and reconnect later:

# Start a tmux session
tmux new -s mitresaw

# Run MITRESaw
./MITRESaw.py -D -E -C

# Detach from tmux: press Ctrl+B then D

# Re-attach anytime to see live progress
tmux attach -t mitresaw

Alternatively, run with nohup and monitor the log:

# Run in background
nohup ./MITRESaw.py -D -E -C -q > mitresaw.log 2>&1 &

# Check progress
tail -5 mitresaw.log

# Watch live
tail -f mitresaw.log

Progress Bar

A dual progress bar is pinned to the bottom of the terminal showing:

Procedures — extraction progress across all group+technique pairs
Citations — collection progress across all citation sources
ETA — estimated time remaining

The progress bar stays in place while extraction output scrolls above it. The current worker count and rate-limit count are shown alongside the ETA.

Exclusion List

MITRESaw supports an exclusion list to filter out known false-positive indicators. Edit data/exclusions.csv with two columns:

indicator,reason
whoami,Common benign command
ipconfig,Common benign command

Exclusions are case-insensitive and apply to both native and citation-extracted indicators. Excluded indicators are silently removed from terminal output and export files.

The exclusion list can also be managed via the web interface.

Web Interface

MITRESaw includes a single-page web interface for running and monitoring from a browser:

pip install fastapi uvicorn sse-starlette
python mitresaw_web.py
# Open http://localhost:6729

Features:

Run configuration — checkbox flags, group/platform filters, worker count
Live log streaming — real-time output via Server-Sent Events
Cache statistics — total cached, success/failed counts, disk usage
Output file browser — download CSV/XLSX results directly
Exclusion editor — add/remove exclusions from the browser
Stop button — cancel a running extraction

No authentication is included — intended for local use only.

Notices

Because the MITRE ATT&CK has been built and is managed in the United States, the keywords provided need to be in US English, as opposed to UK English (e.g. defense vs defence).

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
.vscode		.vscode
data		data
scripts		scripts
src		src
static		static
tests		tests
.gitignore		.gitignore
CONTRIBUTION.md		CONTRIBUTION.md
LICENSE		LICENSE
MITRESaw.py		MITRESaw.py
README.md		README.md
mitresaw_web.py		mitresaw_web.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MITRESaw

Table of Contents

About The Project

Installation

Usage

Quick Start — If In Doubt

Examples

Output Files

Evidence Report (-E)

14-Column Schema

Technique Matrix

Citation Collection (-C)

Multi-Method Fallback Chain

URL Rewriting

Filtered Citations

Indicator Extraction from Citations

Manual Import (-I)

Status Icons

Cache

Retry Options

Pre-Run ETA Estimate

Pre-Fetch Phase

Estimated Run Times

Adaptive Worker Throttling

Performance

Optional Dependencies

Failed Citations Report

Running in the Background

Progress Bar

Exclusion List

Web Interface

Notices

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Manual Import (`-I`)

Packages