Advanced AI-powered email security scanner that detects spam and phishing attempts using machine learning and LLM validation.
Email Guardian is a comprehensive AI-powered toolkit designed to provide robust email security by detecting spam and phishing attempts. It leverages a fine-tuned DistilBERT model with optional LLM validation for enhanced accuracy, offering a full-stack approach to email security with a user-friendly web interface and secure cloud backend.
- π€ AI-Powered Detection: Fine-tuned DistilBERT model for highly accurate multi-class email classification
- π LLM Validation: Optional integration with Groq LLM for enhanced threat detection
- π Web Interface: Intuitive HTML frontend with scan, history, and settings tabs
- β‘ CLI Tool: Command-line interface for batch processing and direct interaction
- π Secure API: RESTful backend with API key authentication and input validation
- π Comprehensive History: Track and review past classifications and potential threats
- π³ Containerized: Dockerized backend for consistent deployment across environments
- Python 3.8+
- Git
- Docker (optional, for local backend deployment)
git clone https://github.com/OMARomd23/Email-Guardian.git
cd Email-Guardianpip install gdown
gdown 1u3oESbMvc-XD9iqwm0LN8JrteS6JbtuUcd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your configurationpython app.pyThe backend API will be accessible at http://localhost:5000.
Open frontend/index.html in your web browser or visit the live demo at:
π https://email-guardian-liard.vercel.app/
To use the hosted version:
- API Key:
contact_me_for_an_api_key - Backend URL:
https://email-guardian-production.up.railway.app
Enter these credentials in the Settings tab of the web interface.
- Visit the web application
- Go to Settings tab and enter your API key and backend URL
- Navigate to Scan tab
- Enter email content and click "Scan Email"
- View results and check History tab for past scans
cd ai
python email_guard.py --text "Your email content here"Example Output:
{
"classification": "phishing",
"confidence": 0.98,
"probabilities": {
"spam": 0.01,
"phishing": 0.98,
"legitimate": 0.01
}
}The Email Guardian backend exposes the following RESTful API endpoints:
| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/health |
GET | System health check | No |
/api/scan |
POST | Classify email content | Yes |
/api/history |
GET | Retrieve scan history | Yes |
/api/stats |
GET | Get classification statistics | Yes |
curl -X POST "https://email-guardian-production.up.railway.app/api/scan" \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key" \
-d '{"text": "Congratulations! You have won $1,000,000. Click here to claim your prize!"}'- Base Model: DistilBERT (lightweight transformer from HuggingFace)
- Task: 3-class classification (spam, phishing, legitimate)
- Output: Class probabilities with confidence scores
TrainingArguments(
output_dir="./results",
eval_strategy="epoch",
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=5,
weight_decay=0.01,
gradient_accumulation_steps=2,
fp16=True
)- Final Accuracy: 98.25%
- Precision: 98.26%
- Recall: 98.26%
- F1-Score: 98.26%
The model was trained on a carefully curated and balanced dataset:
| Class | Samples |
|---|---|
| Legitimate | 92,690 |
| Spam | 40,396 |
| Phishing | 40,014 |
Data Sources:
- Enron Spam Dataset
- CEAS Spam Dataset
- SpamAssassin Dataset
- ealvaradob/phishing-dataset (HuggingFace)
- Phishing dataset from Kaggle
Email-Guardian/
βββ ai/
β βββ email_guard.py # CLI tool
βββ backend/
β βββ app.py # Flask application
β βββ model_handler.py # AI model wrapper
β βββ database.py # Database management
β βββ groq_validator.py # LLM validation
β βββ requirements.txt # Python dependencies
β βββ Dockerfile # Container configuration
βββ frontend/
β βββ index.html # Web interface
βββ docs/
β βββ README.md # A copy of this file
β βββ Email-Guard_distilbert-fine-tuned.ipynb # Model Training notebook
β βββ Data_processing.ipynb # Data Processing notebook
βββ requirements.txt # Root dependencies
βββ README.md # This file
βββ reflection.md # Project reflection
The web frontend is deployed on Vercel for global accessibility and optimal performance.
The backend runs as a Docker container on Railway, with the image hosted on Docker Hub.
Docker Image: omar669/email-guardian
# Build the image
docker build -t email-guardian ./backend
# Run the container
docker run -p 5000:5000 -e API_KEY=your-api-key email-guardian- API Key Authentication: All critical endpoints require valid API keys
- Input Validation: Comprehensive validation and sanitization of user inputs
- HTTPS Deployment: Both frontend and backend use secure HTTPS connections
- Rate Limiting: Protection against abuse with configurable rate limits
- Environment Variables: Sensitive configuration managed through environment variables
This project is licensed under the MIT License - see the LICENSE file for details.
- HuggingFace for the DistilBERT model and datasets
- Railway and Vercel for hosting platforms
- The open-source community for various datasets used in training
OUMESSAOUD Omar
- Email: oumessaoud-omar@proton.me
- LinkedIn: Profile
- GitHub: @OMARomd23
Built with β€οΈ for enhanced email security