Skip to content

cmdaltr/MITRESaw

Repository files navigation

                                                                         ,
                                 ╓╗╗,                          ,╓▄▄▄Φ▓▓██▌╫D
                                ║▌ `▓L            ,,, ╓▄▄▄Φ▓▓▀▀▀╫╫╫╫╫╫╫▀▀╫▓▓▄
                                 ▓▄▓▓▓        ,▄▄B░▀╫Ñ╬░░╫╫▓▓▓▓╫╫╫╫▓▓▓╫╫╫╫╣▓▓▓▄
                                 ║████L   ,╓#▀▀▀╨╫ÑÑ╦▄▒▀╣▓▄▄▀╣▌╫▀    ██╫╫╫╫▓▓╫▓▓φ
                                  ▓╫╫╫▀]Ñ░░░░ÑÑÑÑ░░░░░╠▀W▄╠▀▓▒░╫Ñ╖   ╙└"╜▀▓▓▓▓▓█▓▓
                                  ║░░░╦╬╫╫╫╫╫╫╫╫╫╫╫╫╫ÑÑ░░░╠Ñ░╨╫Ñ░╫╫╫╫N     ▀▓▓▓╫██▓╕
                                ,]░╦╬╫╫╫╫╫╫╫▓▓▓▓▓▓╫╫╫╫╫╫╫Ñ░░╠░░╫M░╠╫╫╫╫╦,    ▀▓▓▓▓▓▓⌐
                       ╗▄╦     ]░░╬╫╫╫╫╫▓▓██████████▓▓▒╫╫╫╫Ñ░░╟▒╟▓▒ñ▓▓▓▓░N    ╙▓▓▓▓▓▓
                   ║███╫█╫    ]░░╫╫╫╫╫▓███▓▓▓▓▓▓▓▓▓▓███▓╫╫╫╫╫░░╟▒╟▓Ü╟▓▓▓▓░H    ╟▓▓▓▓▓L
                   ║███╫█╫   ]░░╫╫╫╫▓██▓╫▓▓▓▀▀╠╠╬▀▓▓▓╫▓██▓╫╫╫╫░░ÑÑ╠▄░╠▓▓▓▄▄▄▄▄▓▓▓╫╫╫╫
                    ╓▄▄╫█╫╖╖╖╦░╫╫╫╫╫██▓▓▓▓▀░╬Ñ╣╬╫Ñ░╟▓▓▓▓██╫╫╫╫Ñ░╦]░░░║████▀▀╫╫╫▓╩╨╟╫
                    ╟▓▓╫█╫▀▀▀╩╬╩╫╫▓██▓▓▓▓▌░╫░╟▓▓K╫Ñ░▓▓▓▓╫██▓▒╩╩╩╩ ╙╩╨▀▓M╨╩╨╙╝╣N╦╗Φ╝
                       ╫█╫     ▀███▀╣▓▓▓▓▓░╫Ñ░╠▀░╫Ü░▓▓▓▓▓▀▀███╕      ▐▓▌╖
                   ▄▄▄▄▓█▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄╛
                                ▀╩╫╫╫╠╣▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▀░╫╫╫╫▌
                                 ╗▄╫╫Ñ░╠▀▓▓▓▓▓▓▓▓▓▓▓▓▀░╦╬╫╫∩
                                   `⌠╫╫╫Ñ░░Å╣▀▀▀▀▀▒░╦╬╫╫╫`█
                                    ╙╙""╫╫╫½╫╫╫╬╫╫╫╫╫M"▓╛
                                       └╙└ ▄▓╩`║▓╩ Å▀

MITRESaw

Cut through MITRE ATT&CK framework and extract relevant identifiers for searching and hunting.

License: MIT Issues Forks Stars Python Last Commit Code style: black

Table of Contents




About The Project

At its core, MITRESaw creates a CSV-formatted version of the MITRE ATT&CK Framework and outputs individual Threat Actor ATT&CK Navigator JSON files, depending on keywords provided.
MITRESaw has evolved to also produce search queries based on extracted indicators (aligned with Threat Group TTPs). Searches currently provided are compatible with Splunk, Azure Sentinel and Elastic/Kibana. SIGMA will be included soon.


Installation

python3 -m pip install -r requirements.txt


Usage

./MITRESaw.py [options]

All arguments are optional named flags with sensible defaults. To display usage, simply run: ./MITRESaw.py -h

usage: MITRESaw.py [-h] [-f FRAMEWORK] [-p PLATFORMS] [-s STRINGS]
                   [-g THREATGROUPS] [-a] [-n] [-o] [-Q] [-q] [-t]
                   [-c COLUMNS] [-D] [-x {csv,json,xml}] [-E] [-C]
                   [-w MAX_WORKERS] [-A] [-I [DIR]]
                   [-rS] [-rN] [--clear-cache] [-F]

options:
  -h, --help                  show this help message and exit
  -f, --framework FRAMEWORK   Specify which framework - Enterprise, ICS or Mobile (default: all three)
  -p, --platforms PLATFORMS   Filter by platform e.g. Windows,Linux,IaaS (default: . for all)
  -s, --strings TERMS         Filter by industry e.g. mining,technology,defense (default: . for all)
  -g, --threatgroups GROUPS   Filter by group e.g. APT29,HAFNIUM,Turla (default: . for all)
  -a, --asciiart              Show ASCII Art of the saw
  -n, --navlayers             Obtain ATT&CK Navigator layers for identified Groups
  -o, --showotherlogsources   Show log sources with less than 1% coverage
  -Q, --queries               Build search queries for Splunk, Azure Sentinel, Elastic/Kibana
  -q, --quiet                 Suppress per-identifier output; print only group completion
  -t, --truncate              Truncate indicator output (still written to file)
  -c, --columns COLUMNS       Export filtered CSV with specified columns (comma-separated)
  -D, --default               Export key procedure columns to mitre_procedures.csv
  -I, --import-citations      Import manually saved PDF/HTML citations (default: data/citations/)
  -x, --export {csv,json,xml} Export format for output files (default: csv)
  -E, --evidence-report       Generate styled XLSX evidence report (one row per indicator)
  -C, --citations             Collect citation sources with multi-method fallback (requires -E)
  -w, --max-workers N         Max parallel threads for fetching (1-50, default: 50)
  -A, --auto                  Skip the pre-run ETA confirmation prompt
  -rS, --retry-stix           Retry citations that fell back to STIX metadata
  -rN, --retry-nocontent      Retry citations that had no content at all
  --clear-cache               Clear the entire citation cache before running
  -F, --fetch                 Force fresh download of ATT&CK STIX data

Quick Start — If In Doubt

The -D (default) flag is the catch-all option. It extracts all groups across all platforms with the key procedure columns and produces a clean CSV ready for SIEM ingestion. Combine with -E to also get the styled XLSX evidence report:

./MITRESaw.py -D -E -C

This gives you everything you need to get started: mitre_procedures.csv for lookups, mitre_procedures.xlsx for analysis, and citation source content from blog posts, vendor reports, and PDFs. Add -q for quieter output, or layer on -g, -p, -t filters to narrow scope.

Examples

# Default export with all groups (fastest way to get results)
./MITRESaw.py -D

# Quiet mode - show group completion instead of every indicator
./MITRESaw.py -D -q

# Filter by platform and threat group
./MITRESaw.py -p Windows -g APT29

# Export as JSON
./MITRESaw.py -g APT29 -x json

# Build search queries for specific groups on Windows/Linux
./MITRESaw.py -p Windows,Linux -t mining,technology,defense -Q

# Export filtered columns with industry keyword tagging
./MITRESaw.py -c group_sw_name,technique_id,technique_name,keywords

# Evidence report with SIEM queries
./MITRESaw.py -g APT29,APT33,OilRig -p Windows -E -Q

# Evidence report with citation collection
./MITRESaw.py -g APT29 -p Windows -E -C

# Retry only stix_metadata failures (keeps successful cache)
./MITRESaw.py -rS -D -E -C

# Retry only no-content failures
./MITRESaw.py -rN -D -E -C

# Retry both stix_metadata and no-content failures
./MITRESaw.py -rS -rN -D -E -C

# Nuclear option: clear entire cache and re-fetch everything
./MITRESaw.py --clear-cache -D -E -C

# Force refresh STIX data and clear citation cache
./MITRESaw.py -D -E -C --clear-cache -F

Valid column names for --columns:

group_sw_id, group_sw_name, group_sw_description, technique_id,
technique_name, technique_description, tactic, platforms, framework,
procedure_example, evidence, detectable_via, keywords

Output Files

When -E is used, MITRESaw produces:

Outputs written to: data/2026-03-28/Windows__APT29/
  🏛️ mitre_procedures.csv
  📎 mitre_procedures.xlsx
  🍠 citations_failed.yaml

When no group/platform/term filters are provided, files are placed in the date root directory (e.g. data/2026-03-28/).

mitre_procedures.csv — One row per group+technique pair. Suitable for direct ingestion as a lookup table into Splunk (| inputlookup), Microsoft Defender for Endpoint, Elastic, or any SIEM. Fields are properly quoted per RFC 4180. Columns: group_sw_id, group_sw_name, group_sw_description, technique_id, technique_name, technique_description, tactic, platforms, framework, procedure_example, evidence, detectable_via.

mitre_procedures.xlsx — Styled evidence report with multiple sheets:

Sheet Description
Evidence Report One row per atomic indicator with 14 columns (see schema below)
Group Summary Per-group stats: technique count, indicator count, tactic coverage, invocation coverage
Tactic Pivot Indicators per tactic, sorted by count, with example technique IDs
Technique Matrix Intersection matrix (only when 2+ groups): techniques as rows, groups as columns, 1 where a group uses that technique, sorted by group coverage descending for prioritising hunting
Reference Detail Citation sources with extracted content, collection method, and URL (only with -C)

citations_failed.yaml — List of citations that fell back to STIX metadata (URL fetch failed across all methods). Includes the full attempt chain for diagnostics. Only generated with -C.

Evidence Report (-E)

The --evidence-report / -E flag generates a styled XLSX evidence report (mitre_procedures.xlsx) with one row per atomic indicator extracted from MITRE ATT&CK procedure examples, plus a companion mitre_procedures.csv for SIEM ingestion.

14-Column Schema

# Column Description
1 Evidential Element The atomic indicator (command, registry key, CVE, port, path, software, event ID)
2 Threat Group Canonical group name
3 Procedure Example MITRE ATT&CK procedure text (cleaned: markdown links shown as Name (ID), citations removed)
4 Technique ID ATT&CK technique ID (e.g. T1059.001)
5 Technique Name ATT&CK technique name
6 Tactic ATT&CK tactic
7 Platforms Target platforms (e.g. Windows, Linux, macOS)
8 Framework ATT&CK framework (Enterprise, ICS, Mobile)
9 MITRE Invocations Invocation strings extracted from procedure text — backtick-wrapped commands, CLI flags, registry paths, file paths as MITRE documented them
10 Detection Guidance Detection context per indicator type (Sysmon EIDs, detection methods)
11 Log Sources MITRESaw-mapped log sources (e.g. Sysmon: 1, Security EventLog: 4688, AppLocker EventLog, netflow, PCAP, *nix /var/log)
12 Reference URL URL from procedure text or constructed ATT&CK technique URL
13 Navigation Layer URL ATT&CK Navigator JSON layer URL for the group
14 Source Type Website or GitHub | Website

Technique Matrix

When 2+ groups are provided (e.g. -g APT29,APT33,OilRig), a Technique Matrix sheet is added showing which techniques are shared across groups. Techniques are sorted by the number of groups that use them (descending), helping prioritise which TTPs to hunt for first — techniques used by all targeted groups offer the highest detection ROI.

Citation Collection (-C)

The -C / --citations flag collects ALL citation source material for each technique — blog posts, vendor reports, government advisories, PDFs, and more. Citations are collected inline during extraction and displayed per technique, with indicators extracted from the fetched content.

Multi-Method Fallback Chain

For each (Citation: X) found in procedure text, technique descriptions, and detection guidance, the collector tries multiple methods in order until content is obtained:

Method Description Status Icon
direct Standard HTTP fetch with browser-like headers
headless Playwright Chromium for Cloudflare/JS-protected sites
wayback Wayback Machine (web.archive.org) archived snapshot
google_cache Google's cached version of the page
pdf:PyPDF2 PDF downloaded and text extracted
cached Previously fetched, loaded from .citation_cache/
stix_metadata STIX description field only (author, title, date) ⚠️

URL Rewriting

Known migrated URLs are automatically rewritten:

  • www.mandiant.com/resources/...cloud.google.com/blog/topics/threat-intelligence/...
  • www.fireeye.com/blog/...cloud.google.com/blog/topics/threat-intelligence/...

Filtered Citations

Homepages and documentation sites are automatically skipped (7-zip, WinRAR, Wikipedia, Microsoft docs, Cisco product docs, etc.) — these have no threat intelligence value.

Indicator Extraction from Citations

When a citation page is successfully fetched, MITRESaw runs its extraction patterns against the content to find additional indicators not present in the MITRE procedure text. The same patterns used for native extraction are applied:

Emoji Type What's extracted
💻 cmd Commands, CLI invocations, backtick-quoted strings
🔑 reg Windows registry paths
🔒 cve CVE identifiers
📁 paths Windows and Unix file/directory paths
📦 software Executables, DLLs, tools
🌐 ports Network port numbers

Only new indicators are shown — anything already extracted by MITRESaw's native pipeline is deduplicated. This means techniques that had no native indicators (e.g. T1621 MFA Request Generation) can still gain indicators from their citation sources.

Citation-extracted indicators are:

  • Displayed in the terminal under each citation with emojis
  • Injected as native evidence rows into mitre_procedures.csv and mitre_procedures.xlsx
  • Atomised in the evidence report (one row per indicator, same as native indicators)
  • Included in the Technique Matrix — techniques gain group coverage from citation indicators

The procedure example column for these rows shows "Indicators extracted from citation: <name> (<url>)" to distinguish them from MITRE-sourced indicators.

Manual Import (-I)

For sites that block automated access, save the page as PDF or HTML from your browser and import it:

# Save blocked pages into data/citations/
# e.g. securelist.com_apt-report.pdf, unit42_medusa.html

# Import and run
./MITRESaw.py -I -D -E -C

# Or specify a different directory
./MITRESaw.py -I /path/to/saved/pages -D -E -C

Supported formats: .pdf, .html/.htm, .txt. Imported files are cached and used on all future runs.

Status Icons

Icon Meaning
Content freshly fetched from source
💾 Content loaded from local cache
⚠️ STIX metadata only (author/title/date — fetch failed)
No content at all

Cache

Fetched pages are cached locally to avoid re-downloading on subsequent runs. Failed URLs are also cached within the same run to avoid re-trying the same broken URL across multiple procedures — this is the single biggest performance optimisation (see below).

Retry Options

Flag What it removes When to use
-rS / --retry-stix Cache entries where fetch failed, only STIX metadata captured (⚠️) After fixing SSL/network issues — sites that were unreachable may now work
-rN / --retry-nocontent Cache entries with completely empty text (❌) After installing Playwright — previously unfetchable pages may now parse
--clear-cache Everything Start completely fresh — re-fetches all ~5,000+ URLs

Use -rS and -rN together to retry all failures while keeping successful cache:

./MITRESaw.py -rS -rN -D -E -C

Pre-Run ETA Estimate

When using -C, MITRESaw scans the cache before starting and shows a summary:

    ┌─────────────────────────────────────────────
    │  Procedures:         4750
    │  Citations:         17451
    │  Cached:            4562
    │  Uncached:          1306
    │  Workers:              50
    │  Estimated time:   1m 18s
    └─────────────────────────────────────────────

    Continue? [Y/n]

Use -A / --auto to skip the confirmation and start immediately.

Pre-Fetch Phase

Before processing procedures, all uncached citations are fetched in a single parallel batch using all available workers. This maximises parallelism — instead of fetching 1-3 citations per procedure sequentially, all uncached URLs are fetched at once. A live progress counter shows completion and ETA.

After pre-fetching, the main processing loop reads exclusively from cache and runs in seconds.

Estimated Run Times

Times are approximate for a full all-groups run (~4,750 procedures, ~17,000 citations, 50 workers). Subsequent runs with a warm cache are significantly faster.

Command First Run Cached Run What It Does
-D ~2 min ~2 min Extract procedures to CSV
-D -E ~3 min ~3 min + styled XLSX evidence report
-D -E -C ~5-15 min ~3 min + citation collection (pre-fetch + extraction)
-D -E -C -Q ~5-15 min ~4 min + search queries (Splunk/Sentinel/Elastic)
-D -E -C -n ~8-18 min ~6 min + Navigator layer downloads
-D -E -C -rS ~5-15 min ~5-15 min + retry STIX-metadata failures

The citation pre-fetch phase accounts for most of the first-run time. The -rS flag clears cached failures and re-fetches them, so it always takes first-run time. A 30-day cooldown warning is shown if -rS was used recently, since the same URLs will likely fail again.

Adaptive Worker Throttling

Workers start at the configured maximum (default 50) and automatically adjust during execution:

  • On 429 rate-limit: workers halve (e.g. 50 → 25)
  • After 50 clean procedures: workers increase by 2 (e.g. 25 → 27)
  • Current worker count and rate-limit count are shown in the progress bar

Performance

Optimisation Detail
Request timeout 8s (direct, wayback fetch, google cache)
Wayback Machine API timeout 5s
Per-domain rate limit 0.5s between requests to same domain
Global rate limit Disabled — per-domain delay is sufficient
Cached failure recognition Empty cache entries skip the full method chain instantly
Pre-fetch batch All uncached URLs fetched in one parallel batch before processing
Gibberish filtering Garbled PDF content (base64, binary) rejected before indicator extraction

Optional Dependencies

Package Purpose Install
playwright Headless browser for JS/Cloudflare sites pip install playwright && playwright install chromium

Playwright is optional — the collector works without it but will skip headless browsing for Cloudflare/JS-protected sites.

Failed Citations Report

Citations that fell back to stix_metadata are written to citations_failed.yaml in the output directory, with the full attempt chain for each URL.

Running in the Background

For large runs (all groups with citation collection), MITRESaw can take a significant amount of time. Use tmux to run it in the background and reconnect later:

# Start a tmux session
tmux new -s mitresaw

# Run MITRESaw
./MITRESaw.py -D -E -C

# Detach from tmux: press Ctrl+B then D

# Re-attach anytime to see live progress
tmux attach -t mitresaw

Alternatively, run with nohup and monitor the log:

# Run in background
nohup ./MITRESaw.py -D -E -C -q > mitresaw.log 2>&1 &

# Check progress
tail -5 mitresaw.log

# Watch live
tail -f mitresaw.log

Progress Bar

A dual progress bar is pinned to the bottom of the terminal showing:

  • Procedures — extraction progress across all group+technique pairs
  • Citations — collection progress across all citation sources
  • ETA — estimated time remaining

The progress bar stays in place while extraction output scrolls above it. The current worker count and rate-limit count are shown alongside the ETA.

Exclusion List

MITRESaw supports an exclusion list to filter out known false-positive indicators. Edit data/exclusions.csv with two columns:

indicator,reason
whoami,Common benign command
ipconfig,Common benign command

Exclusions are case-insensitive and apply to both native and citation-extracted indicators. Excluded indicators are silently removed from terminal output and export files.

The exclusion list can also be managed via the web interface.

Web Interface

MITRESaw includes a single-page web interface for running and monitoring from a browser:

pip install fastapi uvicorn sse-starlette
python mitresaw_web.py
# Open http://localhost:6729

Features:

  • Run configuration — checkbox flags, group/platform filters, worker count
  • Live log streaming — real-time output via Server-Sent Events
  • Cache statistics — total cached, success/failed counts, disk usage
  • Output file browser — download CSV/XLSX results directly
  • Exclusion editor — add/remove exclusions from the browser
  • Stop button — cancel a running extraction

No authentication is included — intended for local use only.

Notices

Because the MITRE ATT&CK has been built and is managed in the United States, the keywords provided need to be in US English, as opposed to UK English (e.g. defense vs defence).


Acknowledgements




About

Obtain actionable identifiers from MITRE ATT&CK framework based on provided parameters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors