PROTEUS

Advanced zero-day static analysis engine built with Rust and Python

Features • Quick Start • Dashboard • Documentation • Contributing • License

Advanced Zero-Day Static Analysis Engine

Proteus is a high-performance malware analysis tool built with Rust and Python, designed to detect zero-day threats through static analysis, heuristics, and machine learning.

Features

Core Analysis

PE/ELF Binary Analysis - Deep inspection of Windows and Linux executables
Entropy Calculation - Detect packed/encrypted malware (section-level granularity)
Heuristic Scoring - Intelligent threat assessment with configurable thresholds
String Extraction - ASCII and wide string analysis with pattern detection
IOC Detection - Automatic extraction of URLs, IPs, registry keys, file paths
High Performance - Rust-powered core with parallel processing via Rayon
Batch Processing - Scan entire directories efficiently

Detection Engines

ML Detection - Random Forest (96% accuracy) + Isolation Forest anomaly detection
YARA Engine - 40+ industry-standard detection rules
- Ransomware: WannaCry, Ryuk, Maze, Locky families
- RAT Detection: NanoCore, njRAT, DarkComet, Quasar, AsyncRAT
- Banking Trojans: Emotet, TrickBot, Dridex, Zeus, Formbook, AgentTesla
- Packer Detection: UPX, ASPack, Themida, VMProtect, PECompact, MPRESS
- Suspicious Behaviors: Code injection, credential dumping, keyloggers, browser theft
Multi-Layer Analysis - Combine heuristic + ML + YARA for maximum accuracy

Advanced Features

ML Ready - Feature extraction pipeline for machine learning
Feature Engineering - 16+ features including entropy, imports, exports, strings
Detection Metrics - Built-in accuracy, precision, recall tracking
Extensible - Modular architecture for custom analyzers

Web Dashboard

Proteus v0.3.0 includes a modern web interface for easy analysis.

Launching the Dashboard

Start the API Server:

# Windows
.\venv\Scripts\activate
python -m uvicorn server:app --reload --port 8000

# Linux/Mac
source venv/bin/activate
python -m uvicorn server:app --reload --port 8000

Open Browser: Navigate to http://localhost:8000

Features

Drag & Drop Analysis: Upload PE/ELF files instantly
Live Stats: Monitor system health, rule counts, and detection rates
History: Local storage tracking of past scans
Visual Reports: Beautiful breakdown of entropy, threat scores, and indicators

Detection Metrics (Real-World Dataset)

Metric	Value
Test Accuracy	96.22%
Precision (Malicious)	95%
Recall (Malicious)	97%
F1-Score	0.96
False Positive Rate	0.97%
Training Dataset	1,190 samples
Real Malware Samples	576
Clean Samples	614

Quick Start

Prerequisites

Rust 1.83+ (Install)
Python 3.10+ (Install)
Windows 10/11 or Linux
YARA 4.5+ (Optional, required for Rust build)
MalwareBazaar API (Optional, for dataset collection - included in code)

Installation

git clone https://github.com/ChronoCoders/proteus.git
cd proteus

python -m venv venv
venv\Scripts\activate

pip install -r requirements.txt

maturin develop --release

Basic Usage

Analyze a single file:

python cli.py file C:\path\to\sample.exe

Analyze with ML prediction:

python cli.py file C:\path\to\sample.exe --ml

Analyze with YARA rules:

python cli.py file C:\path\to\sample.exe --yara

Complete analysis (Heuristic + ML + YARA):

python cli.py file C:\path\to\sample.exe --ml --yara

Full analysis with strings:

python cli.py file C:\path\to\sample.exe --ml --yara --strings

String-only analysis:

python cli.py strings C:\path\to\sample.exe

Batch scan directory:

python cli.py dir C:\path\to\samples --output results.json

Collecting Real Malware Dataset

Collect malware samples from MalwareBazaar (default: 50 samples per tag, ~500 total):

python malware_collector.py

Collect with custom sample count:

# Collect 100 samples per tag (~1000 total)
python malware_collector.py --samples=100

# Collect 20 samples per tag (~200 total)
python malware_collector.py --samples=20

Enable verbose debugging mode:

python malware_collector.py --verbose

Combine options:

python malware_collector.py --samples=100 --verbose

Features:

Automatic AES-encrypted ZIP extraction
Retry logic for failed downloads (2 attempts per sample)
Real-time progress tracking
Graceful interrupt handling (Ctrl+C saves progress)
Metadata persistence (resume capability)
10 malware categories: ransomware, trojan, rat, stealer, backdoor, loader, miner, banker, spyware, worm

Collection Statistics:

Default: ~500 samples in ~17 minutes
Large: ~1000 samples in ~33 minutes
Custom: configurable via --samples=N

Building Test Dataset

python test_dataset_builder.py

Training ML Models

python ml_trainer.py

Documentation

Example Output

╔═══════════════════════════════════════╗
║         PROTEUS v0.2.0                ║
║   Zero-Day Static Analysis Engine     ║
╚═══════════════════════════════════════╝

[*] Analysis: suspicious.exe
[+] Type: PE
[+] Entropy: 7.85
[+] Threat Score: 66.00/100
[+] Verdict: MALICIOUS
[!] Suspicious Indicators:
    - VirtualAlloc
    - CreateRemoteThread
    - WriteProcessMemory

[*] YARA Scan:
[!] YARA Matches: 3
    Rule: Suspicious_Code_Injection
      Severity: HIGH
      Family: suspicious
    Rule: Emotet_Trojan
      Severity: CRITICAL
      Family: trojan
    Rule: UPX_Packer
      Severity: MEDIUM
      Family: packer

[*] ML Analysis:
[+] ML Prediction: MALICIOUS
[+] Confidence: 100.00%
[+] Probabilities:
    Clean: 0.00%
    Malicious: 100.00%

[*] String Analysis:
[+] Total strings: 342
[+] Encoded strings: 15

[!] URLs (2):
    http://malicious-c2.com/payload
    https://evil.net/download

[!] Suspicious strings (8):
    cmd.exe /c powershell
    Disable-WindowsDefender
    keylogger.dll

Architecture

proteus/
├── src/                      # Rust core engine
│   ├── lib.rs                # Module entry point
│   ├── pe_parser.rs          # PE file parsing (goblin)
│   ├── elf_parser.rs         # ELF file parsing
│   ├── entropy.rs            # Shannon entropy calculation
│   ├── heuristics.rs         # Threat scoring algorithms
│   ├── string_extractor.rs   # String analysis engine
│   └── python_bindings.rs    # PyO3 FFI bindings
├── python/                   # Python orchestration
│   ├── __init__.py
│   ├── analyzer.py           # Main analyzer class
│   ├── ml_detector.py        # ML model integration
│   ├── yara_engine.py        # YARA rule engine
│   ├── config.py             # Configuration management
│   ├── validators.py         # Security validators
│   └── rate_limiter.py       # API rate limiting
├── yara_rules/               # YARA detection rules
│   ├── ransomware.yar        # Ransomware signatures
│   ├── rats.yar              # RAT detection
│   ├── trojans.yar           # Banking trojans
│   ├── packers.yar           # Packer detection
│   └── suspicious_behavior.yar # Behavioral analysis
├── cli.py                    # Command-line interface
├── malware_collector.py      # MalwareBazaar dataset collector
├── ml_trainer.py             # ML training pipeline
├── test_dataset_builder.py   # Dataset generation
├── requirements.txt          # Python dependencies
├── Cargo.toml                # Rust dependencies
└── pyproject.toml            # Python project configuration

Feature Extraction

Proteus extracts 16+ features per sample:

Binary Features:

Global entropy
Section count
Max section entropy
Import count
Export count
Suspicious API count

String Features:

Total strings
URL count
IP count
Registry key count
Suspicious keyword count
File path count
Encoded string count
Encoded ratio
Suspicious ratio

Threat Detection Patterns

High Entropy Indicators:

Entropy > 7.8: Likely packed/encrypted
Entropy > 7.5: Suspicious compression
Entropy > 7.2: Elevated entropy

Suspicious APIs (PE):

VirtualAlloc, VirtualProtect, WriteProcessMemory,
CreateRemoteThread, LoadLibrary, GetProcAddress,
WinExec, ShellExecute, URLDownloadToFile,
CreateProcess, OpenProcess, ReadProcessMemory,
SetWindowsHookEx, GetAsyncKeyState, InternetOpen

Suspicious Symbols (ELF):

execve, system, fork, ptrace, mprotect,
mmap, dlopen, socket, bind

Suspicious Keywords (Strings):

cmd, powershell, eval, exec, system, shell,
download, upload, exploit, payload, inject,
keylog, screenshot, webcam, ransomware,
encrypt, bitcoin, miner, bypass, disable

Development

Build & Test

maturin develop

maturin develop --release

cargo test

python -m pytest

cargo clippy
mypy .

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Rust: Follow rustfmt and clippy recommendations
Python: Follow PEP 8, type hints required
No comments in code (self-documenting code preferred)
Use latest stable versions of dependencies

Roadmap

v0.2.0 (Current)

YARA rule engine (40+ detection rules)
Ransomware, RAT, Trojan, Packer detection
Suspicious behavior analysis
CLI --yara flag integration
Multi-layer detection (Heuristic + ML + YARA)

v0.3.0 (Current)

Web Dashboard - Modern SPA with real-time stats and drag-drop analysis
Sandbox Integration - Dockerized dynamic analysis environment
Real ML Models - Random Forest trained on 1000+ real samples
Deep Static Analysis - Imphash, Rich Header, and Authenticode support
API Server - FastAPI backend for programmatic scanning

v0.4.0 (Planned)

Advanced packer detection enhancements
Digital signature validation
PE resource section analysis
Retrain ML models with larger real-world dataset (1000+ samples)
Custom YARA rule support via CLI

v0.5.0 (Future)

Performance

Benchmarks (Intel i7, 16GB RAM):

Single file analysis: ~50ms
Batch processing (100 files): ~3 seconds
String extraction: ~20ms
ML prediction: ~5ms
YARA scanning: ~100ms

Limitations

Current Version (v0.2.0):

ML models require training on collected real-world samples
No dynamic analysis capabilities
Windows-focused (PE analysis more mature than ELF)
Dataset collection requires MalwareBazaar API access

Recommended Use:

Educational purposes
Research projects
Malware analysis training
Static analysis component in larger systems
Dataset collection for ML training

Security & Legal

Important Notes:

Always analyze malware in isolated environments (VMs/sandboxes)
Do not use on production systems without proper testing
Obey local laws regarding malware possession and analysis
This tool is for educational and research purposes only

Disclaimer: The authors are not responsible for misuse of this tool. Users are solely responsible for ensuring their usage complies with applicable laws and regulations.

License

MIT License - see LICENSE file for details

Authors

ChronoCoders Team

Advanced static analysis engine
ML integration
YARA rule engine
Performance optimization

Acknowledgments

goblin - Excellent binary parsing library
PyO3 - Seamless Rust-Python integration
Rayon - Parallel processing made easy
scikit-learn - ML algorithms
pyzipper - AES-encrypted ZIP extraction
MalwareBazaar - Real-world malware sample repository
YARA - Industry-standard malware detection framework

Additional Resources

If you find Proteus useful, please star the repository!

Found a bug? Open an issue

Have a feature request? Start a discussion

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.venv		.venv
data/clean		data/clean
python		python
src		src
web		web
yara_rules		yara_rules
.env.example		.env.example
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile.sandbox		Dockerfile.sandbox
README.md		README.md
cli.py		cli.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

PROTEUS

Features

Core Analysis

Detection Engines

Advanced Features

Web Dashboard

Launching the Dashboard

Features

Detection Metrics (Real-World Dataset)

Quick Start

Prerequisites

Installation

Basic Usage

Collecting Real Malware Dataset

Building Test Dataset

Training ML Models

Documentation

Example Output

Architecture

Feature Extraction

Threat Detection Patterns

Development

Build & Test

Contributing

Code Style

Roadmap

v0.2.0 (Current)

v0.3.0 (Current)

v0.4.0 (Planned)

v0.5.0 (Future)

Performance

Limitations

Security & Legal

License

Authors

Acknowledgments

Additional Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages