Email Guardian 🛡️

Advanced AI-powered email security scanner that detects spam and phishing attempts using machine learning and LLM validation.

🎯 Overview

Email Guardian is a comprehensive AI-powered toolkit designed to provide robust email security by detecting spam and phishing attempts. It leverages a fine-tuned DistilBERT model with optional LLM validation for enhanced accuracy, offering a full-stack approach to email security with a user-friendly web interface and secure cloud backend.

✨ Features

🤖 AI-Powered Detection: Fine-tuned DistilBERT model for highly accurate multi-class email classification
🔍 LLM Validation: Optional integration with Groq LLM for enhanced threat detection
🌐 Web Interface: Intuitive HTML frontend with scan, history, and settings tabs
⚡ CLI Tool: Command-line interface for batch processing and direct interaction
🔒 Secure API: RESTful backend with API key authentication and input validation
📊 Comprehensive History: Track and review past classifications and potential threats
🐳 Containerized: Dockerized backend for consistent deployment across environments

🚀 Quick Start

Prerequisites

Python 3.8+
Git
Docker (optional, for local backend deployment)

1. Clone the Repository

git clone https://github.com/OMARomd23/Email-Guardian.git
cd Email-Guardian

2. Download the AI Model

pip install gdown
gdown 1u3oESbMvc-XD9iqwm0LN8JrteS6JbtuU

3. Set Up the Backend

cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your configuration

4. Start the Application

python app.py

The backend API will be accessible at http://localhost:5000.

5. Access the Frontend

Open frontend/index.html in your web browser or visit the live demo at:

🌐 https://email-guardian-liard.vercel.app/

🔑 API Configuration

To use the hosted version:

API Key: contact_me_for_an_api_key
Backend URL: https://email-guardian-production.up.railway.app

Enter these credentials in the Settings tab of the web interface.

🛠️ Usage

Web Interface

Visit the web application
Go to Settings tab and enter your API key and backend URL
Navigate to Scan tab
Enter email content and click "Scan Email"
View results and check History tab for past scans

CLI Tool

cd ai
python email_guard.py --text "Your email content here"

Example Output:

{
  "classification": "phishing",
  "confidence": 0.98,
  "probabilities": {
    "spam": 0.01,
    "phishing": 0.98,
    "legitimate": 0.01
  }
}

📡 API Endpoints

The Email Guardian backend exposes the following RESTful API endpoints:

Endpoint	Method	Description	Auth Required
`/health`	GET	System health check	No
`/api/scan`	POST	Classify email content	Yes
`/api/history`	GET	Retrieve scan history	Yes
`/api/stats`	GET	Get classification statistics	Yes

Example API Request

curl -X POST "https://email-guardian-production.up.railway.app/api/scan" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key" \
  -d '{"text": "Congratulations! You have won $1,000,000. Click here to claim your prize!"}'

🤖 AI Model Details

Model Architecture

Base Model: DistilBERT (lightweight transformer from HuggingFace)
Task: 3-class classification (spam, phishing, legitimate)
Output: Class probabilities with confidence scores

Training Configuration

TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    gradient_accumulation_steps=2,
    fp16=True
)

Performance Metrics

Final Accuracy: 98.25%
Precision: 98.26%
Recall: 98.26%
F1-Score: 98.26%

📊 Dataset Information

The model was trained on a carefully curated and balanced dataset:

Class	Samples
Legitimate	92,690
Spam	40,396
Phishing	40,014

Data Sources:

Enron Spam Dataset
CEAS Spam Dataset
SpamAssassin Dataset
ealvaradob/phishing-dataset (HuggingFace)
Phishing dataset from Kaggle

🏗️ Project Structure

Email-Guardian/
├── ai/
│   └── email_guard.py          # CLI tool
├── backend/
│   ├── app.py                  # Flask application
│   ├── model_handler.py        # AI model wrapper
│   ├── database.py             # Database management
│   ├── groq_validator.py       # LLM validation
│   ├── requirements.txt        # Python dependencies
│   └── Dockerfile              # Container configuration
├── frontend/
│   └── index.html              # Web interface
├── docs/
│   ├── README.md               # A copy of this file
│   ├── Email-Guard_distilbert-fine-tuned.ipynb   # Model Training notebook 
│   └── Data_processing.ipynb   # Data Processing notebook
├── requirements.txt            # Root dependencies
│── README.md                   # This file
└── reflection.md               # Project reflection

🚀 Deployment

Frontend (Vercel)

The web frontend is deployed on Vercel for global accessibility and optimal performance.

Backend (Railway + Docker)

The backend runs as a Docker container on Railway, with the image hosted on Docker Hub.

Docker Image: omar669/email-guardian

Local Deployment with Docker

# Build the image
docker build -t email-guardian ./backend

# Run the container
docker run -p 5000:5000 -e API_KEY=your-api-key email-guardian

🔒 Security Features

API Key Authentication: All critical endpoints require valid API keys
Input Validation: Comprehensive validation and sanitization of user inputs
HTTPS Deployment: Both frontend and backend use secure HTTPS connections
Rate Limiting: Protection against abuse with configurable rate limits
Environment Variables: Sensitive configuration managed through environment variables

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

HuggingFace for the DistilBERT model and datasets
Railway and Vercel for hosting platforms
The open-source community for various datasets used in training

📞 Contact

OUMESSAOUD Omar

Email: oumessaoud-omar@proton.me
LinkedIn: Profile
GitHub: @OMARomd23

Built with ❤️ for enhanced email security

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
ai		ai
backend		backend
docs		docs
frontend		frontend
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
reflection.md		reflection.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Email Guardian 🛡️

🎯 Overview

✨ Features

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Download the AI Model

3. Set Up the Backend

4. Start the Application

5. Access the Frontend

🔑 API Configuration

🛠️ Usage

Web Interface

CLI Tool

📡 API Endpoints

Example API Request

🤖 AI Model Details

Model Architecture

Training Configuration

Performance Metrics

📊 Dataset Information

🏗️ Project Structure

🚀 Deployment

Frontend (Vercel)

Backend (Railway + Docker)

Local Deployment with Docker

🔒 Security Features

📄 License

🙏 Acknowledgments

📞 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages