Skip to content

OMARomd23/Email-Guardian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

70 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Email Guardian πŸ›‘οΈ

Advanced AI-powered email security scanner that detects spam and phishing attempts using machine learning and LLM validation.

Live Demo Docker Hub License

🎯 Overview

Email Guardian is a comprehensive AI-powered toolkit designed to provide robust email security by detecting spam and phishing attempts. It leverages a fine-tuned DistilBERT model with optional LLM validation for enhanced accuracy, offering a full-stack approach to email security with a user-friendly web interface and secure cloud backend.

✨ Features

  • πŸ€– AI-Powered Detection: Fine-tuned DistilBERT model for highly accurate multi-class email classification
  • πŸ” LLM Validation: Optional integration with Groq LLM for enhanced threat detection
  • 🌐 Web Interface: Intuitive HTML frontend with scan, history, and settings tabs
  • ⚑ CLI Tool: Command-line interface for batch processing and direct interaction
  • πŸ”’ Secure API: RESTful backend with API key authentication and input validation
  • πŸ“Š Comprehensive History: Track and review past classifications and potential threats
  • 🐳 Containerized: Dockerized backend for consistent deployment across environments

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Git
  • Docker (optional, for local backend deployment)

1. Clone the Repository

git clone https://github.com/OMARomd23/Email-Guardian.git
cd Email-Guardian

2. Download the AI Model

pip install gdown
gdown 1u3oESbMvc-XD9iqwm0LN8JrteS6JbtuU

3. Set Up the Backend

cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your configuration

4. Start the Application

python app.py

The backend API will be accessible at http://localhost:5000.

5. Access the Frontend

Open frontend/index.html in your web browser or visit the live demo at:

🌐 https://email-guardian-liard.vercel.app/

πŸ”‘ API Configuration

To use the hosted version:

  • API Key: contact_me_for_an_api_key
  • Backend URL: https://email-guardian-production.up.railway.app

Enter these credentials in the Settings tab of the web interface.

πŸ› οΈ Usage

Web Interface

  1. Visit the web application
  2. Go to Settings tab and enter your API key and backend URL
  3. Navigate to Scan tab
  4. Enter email content and click "Scan Email"
  5. View results and check History tab for past scans

CLI Tool

cd ai
python email_guard.py --text "Your email content here"

Example Output:

{
  "classification": "phishing",
  "confidence": 0.98,
  "probabilities": {
    "spam": 0.01,
    "phishing": 0.98,
    "legitimate": 0.01
  }
}

πŸ“‘ API Endpoints

The Email Guardian backend exposes the following RESTful API endpoints:

Endpoint Method Description Auth Required
/health GET System health check No
/api/scan POST Classify email content Yes
/api/history GET Retrieve scan history Yes
/api/stats GET Get classification statistics Yes

Example API Request

curl -X POST "https://email-guardian-production.up.railway.app/api/scan" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key" \
  -d '{"text": "Congratulations! You have won $1,000,000. Click here to claim your prize!"}'

πŸ€– AI Model Details

Model Architecture

  • Base Model: DistilBERT (lightweight transformer from HuggingFace)
  • Task: 3-class classification (spam, phishing, legitimate)
  • Output: Class probabilities with confidence scores

Training Configuration

TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    gradient_accumulation_steps=2,
    fp16=True
)

Performance Metrics

  • Final Accuracy: 98.25%
  • Precision: 98.26%
  • Recall: 98.26%
  • F1-Score: 98.26%

πŸ“Š Dataset Information

The model was trained on a carefully curated and balanced dataset:

Class Samples
Legitimate 92,690
Spam 40,396
Phishing 40,014

Data Sources:

  • Enron Spam Dataset
  • CEAS Spam Dataset
  • SpamAssassin Dataset
  • ealvaradob/phishing-dataset (HuggingFace)
  • Phishing dataset from Kaggle

πŸ—οΈ Project Structure

Email-Guardian/
β”œβ”€β”€ ai/
β”‚   └── email_guard.py          # CLI tool
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app.py                  # Flask application
β”‚   β”œβ”€β”€ model_handler.py        # AI model wrapper
β”‚   β”œβ”€β”€ database.py             # Database management
β”‚   β”œβ”€β”€ groq_validator.py       # LLM validation
β”‚   β”œβ”€β”€ requirements.txt        # Python dependencies
β”‚   └── Dockerfile              # Container configuration
β”œβ”€β”€ frontend/
β”‚   └── index.html              # Web interface
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ README.md               # A copy of this file
β”‚   β”œβ”€β”€ Email-Guard_distilbert-fine-tuned.ipynb   # Model Training notebook 
β”‚   └── Data_processing.ipynb   # Data Processing notebook
β”œβ”€β”€ requirements.txt            # Root dependencies
│── README.md                   # This file
└── reflection.md               # Project reflection

πŸš€ Deployment

Frontend (Vercel)

The web frontend is deployed on Vercel for global accessibility and optimal performance.

Backend (Railway + Docker)

The backend runs as a Docker container on Railway, with the image hosted on Docker Hub.

Docker Image: omar669/email-guardian

Local Deployment with Docker

# Build the image
docker build -t email-guardian ./backend

# Run the container
docker run -p 5000:5000 -e API_KEY=your-api-key email-guardian

πŸ”’ Security Features

  • API Key Authentication: All critical endpoints require valid API keys
  • Input Validation: Comprehensive validation and sanitization of user inputs
  • HTTPS Deployment: Both frontend and backend use secure HTTPS connections
  • Rate Limiting: Protection against abuse with configurable rate limits
  • Environment Variables: Sensitive configuration managed through environment variables

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • HuggingFace for the DistilBERT model and datasets
  • Railway and Vercel for hosting platforms
  • The open-source community for various datasets used in training

πŸ“ž Contact

OUMESSAOUD Omar


Built with ❀️ for enhanced email security

About

πŸ›‘οΈ Email Guardian - AI-powered email security scanner that detects spam and phishing attempts using a fine-tuned DistilBERT and optional LLM validation. Features web interface, CLI tool, and comprehensive analytics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors