A comprehensive spam detection system that combines traditional machine learning with deep learning models for superior accuracy. This project implements multiple AI approaches including LSTM, CNN, and ensemble methods with an interactive web interface.
Try it live: https://finsecure-ai-uepz.onrender.com
- Deep Learning Models: LSTM, CNN-LSTM, and Ensemble neural networks
- Traditional ML: Naive Bayes with TF-IDF (baseline)
- Ensemble Method: Weighted combination of all models for optimal accuracy
- Real-time Model Switching: Compare different algorithms instantly
- Text Classification: Multi-model spam detection with confidence scores
- URL Safety Analysis: Domain trust scoring and phishing detection
- Interactive Comparison: Side-by-side model performance analysis
- Real-time Processing: Fast inference with multiple model options
- Responsive Design: Works on desktop and mobile devices
- Model Selection: Easy switching between AI algorithms
- Visual Analytics: Charts and confidence meters
- Dark/Light Theme: Modern UI with smooth animations
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ก๏ธ Advanced Spam Text Detector โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ค Select AI Model โ
โ โ Ensemble (Best accuracy) โ
Available โ
โ โ Deep Learning (LSTM/CNN) โ
Available โ
โ โ Traditional ML (Naive Bayes) โ
Available โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Enter text to analyze... โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ [๐ Analyze] [๐๏ธ Clear] [โ๏ธ Compare All Models] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐จ SPAM DETECTED โ
โ Confidence: โโโโโโโโโโโโโโโโโโโโ 89.3% โ
โ Model: Deep Learning (LSTM/CNN) โ
โ โ
โ ๐ Ensemble Breakdown: โ
โ Traditional ML (30%): Not Spam (62.1%) โ
โ Deep Learning (70%): Spam (89.3%) โ
โ Final Decision: Spam (78.5%) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Clone the repository
git clone https://github.com/yourusername/spam-text-detector.git
cd spam-text-detector
# Run automated setup
python setup.py
# Start the application
python main_dl.py# Install dependencies
pip install -r requirements.txt
# Download NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
# Train models (requires dataset)
python train_models.py --model both
# Start enhanced application
python main_dl.py
# Or start original version
python main.pyRequired: SMS Spam Collection Dataset
- Source: UCI ML Repository SMS Spam Collection dataset
- File: Save as
mail_data.csvin project root - Format: CSV with columns:
Category,Message - Size: ~5,572 SMS messages (ham/spam labeled)
Sequential([
Embedding(10000, 128, input_length=100),
Bidirectional(LSTM(64, return_sequences=True, dropout=0.3)),
Bidirectional(LSTM(32, dropout=0.3)),
Dense(64, activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(1, activation='sigmoid')
])Sequential([
Embedding(10000, 128, input_length=100),
Conv1D(128, 5, activation='relu'),
MaxPooling1D(5),
Conv1D(64, 5, activation='relu'),
MaxPooling1D(5),
LSTM(64, dropout=0.3),
Dense(64, activation='relu'),
Dense(1, activation='sigmoid')
])- LSTM Branch: Bidirectional LSTM + GlobalMaxPooling
- CNN Branch: Conv1D + GlobalMaxPooling
- Fusion: Concatenate + Dense layers
- Output: Sigmoid activation for binary classification
- Optimizer: Adam (lr=0.001)
- Loss: Binary crossentropy
- Metrics: Accuracy, Precision, Recall
- Callbacks: EarlyStopping, ReduceLROnPlateau
- Data Split: 70% train, 15% validation, 15% test
# Enhanced version with deep learning
python main_dl.py
# Original version (traditional ML only)
python main.py
# Open browser
http://localhost:8000import requests
# Single model analysis
response = requests.post('http://localhost:8000/api/analyze',
json={
'text': 'URGENT: Your account has been compromised!',
'model_type': 'ensemble' # or 'deep_learning', 'traditional'
}
)
result = response.json()
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1f}%")# Compare all models on same text
response = requests.get('http://localhost:8000/api/models/compare',
params={'text': 'Free money! Click here now!'}
)
comparison = response.json()
for model, result in comparison['model_predictions'].items():
print(f"{model}: {result['prediction']} ({result['confidence']:.1f}%)")response = requests.get('http://localhost:8000/api/models/status')
status = response.json()
print("Available models:", list(status.keys()))# Train all models
python train_models.py --model both --epochs 30
# Train specific model
python train_models.py --model deep_learning --epochs 50
python train_models.py --model traditional
# Get help
python train_models.py --helpfrom deep_learning_model import SpamDetectorDL
# Load trained model
detector = SpamDetectorDL.load_model(
'spam_detector_ensemble.h5',
'tokenizer_ensemble.pkl'
)
# Make predictions
result = detector.predict("Congratulations! You won $1000!")
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1f}%")
print(f"Spam Probability: {result['spam_probability']:.1f}%")| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Traditional ML | ~96% | ~95% | ~94% | ~94% |
| LSTM | ~97-98% | ~96% | ~95% | ~95% |
| CNN-LSTM | ~97-98% | ~96% | ~95% | ~95% |
| Ensemble | ~98-99% | ~97% | ~96% | ~96% |
- Traditional ML: <10ms per prediction
- Deep Learning: <100ms per prediction
- Ensemble: <150ms per prediction
spam-text-detector/
โโโ ๐ README.md # This file
โโโ ๐ requirements.txt # Python dependencies
โโโ ๐ setup.py # Automated setup script
โโโ ๐ train_models.py # Model training script
โโโ ๐ test_spam_detection.py # Testing script
โ
โโโ ๐ค AI Models
โ โโโ deep_learning_model.py # Deep learning implementation
โ โโโ main_dl.py # Enhanced FastAPI app
โ โโโ main.py # Original FastAPI app
โ
โโโ ๐จ Frontend
โ โโโ templates/
โ โ โโโ index_dl.html # Enhanced interface
โ โ โโโ index.html # Original interface
โ โโโ static/
โ โโโ css/styles.css # Styling
โ โโโ js/main_dl.js # Enhanced JavaScript
โ โโโ js/main.js # Original JavaScript
โ
โโโ ๐ Data (you need to add)
โ โโโ mail_data.csv # SMS spam dataset
โ
โโโ ๐ง Generated (after training)
โโโ models/
โ โโโ spam_detector_lstm.h5
โ โโโ spam_detector_ensemble.h5
โ โโโ tokenizer_lstm.pkl
โ โโโ text_classification.pkl
โโโ outputs/
โโโ training_history_dl.png
โโโ confusion_matrix_dl.png
โโโ model_comparison.png
| Model | Best For | Pros | Cons |
|---|---|---|---|
| Ensemble | Production use | Highest accuracy, robust | Slower inference |
| Deep Learning | Complex patterns | Context understanding | Requires more resources |
| Traditional ML | Fast deployment | Speed, interpretability | Lower accuracy |
# Test different models
models = ['traditional', 'deep_learning', 'ensemble']
test_text = "URGENT: Verify your account now!"
for model in models:
result = analyze_text(test_text, model)
print(f"{model:15}: {result['prediction']:8} ({result['confidence']:5.1f}%)")# Deep Learning Configuration
SpamDetectorDL(
max_features=10000, # Vocabulary size
max_length=100, # Sequence length
embedding_dim=128 # Embedding dimensions
)
# Training Parameters
epochs=30
batch_size=32
validation_split=0.2
early_stopping_patience=10
# Ensemble Weights
dl_weight = 0.7 # Deep learning model weight
traditional_weight = 0.3 # Traditional ML weight# Optional: Disable TensorFlow warnings
export TF_ENABLE_ONEDNN_OPTS=0
# Optional: Set TensorFlow log level
export TF_CPP_MIN_LOG_LEVEL=2# Test all models with various samples
python test_spam_detection.py
# Expected output:
# โ
Obvious spam detection: 80-95%
# โ
Legitimate messages: 95-100%
# โ ๏ธ Subtle spam detection: 60-80%# Test specific samples
test_samples = [
"CONGRATULATIONS! You won $1000!", # Should be: Spam
"Hey, lunch tomorrow at 12pm?", # Should be: Not Spam
"Your account has been compromised!", # Should be: Spam
"Meeting moved to Monday", # Should be: Not Spam
]
for text in test_samples:
result = detector.predict(text)
print(f"'{text}' โ {result['prediction']} ({result['confidence']:.1f}%)")# Retrain models if corrupted
python train_models.py --model both# Reduce batch size in training
batch_size=16 # Instead of 32import nltk
nltk.download('punkt')
nltk.download('stopwords')# Check if port is in use
netstat -an | findstr :8000
# Use different port
uvicorn main_dl:app --port 8001# Retrain with more epochs
python train_models.py --model deep_learning --epochs 50
# Check dataset quality
python -c "import pandas as pd; print(pd.read_csv('mail_data.csv').info())"| Method | Endpoint | Description | Parameters |
|---|---|---|---|
GET |
/ |
Web interface | - |
POST |
/api/analyze |
Analyze text | text, model_type |
GET |
/api/models/compare |
Compare models | text |
GET |
/api/models/status |
Model availability | - |
GET |
/api/demo-text |
Sample text | - |
{
"prediction": "Spam",
"confidence": 89.3,
"spam_probability": 89.3,
"not_spam_probability": 10.7,
"model_used": "Deep Learning (LSTM/CNN)",
"urls": [
{
"domain": "suspicious-site.com",
"trust_score": 25,
"classification": "Suspicious",
"risk_factors": ["Contains suspicious keywords"]
}
]
}- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install development dependencies
pip install -r requirements.txt
pip install pytest black flake8
# Run tests
pytest tests/
# Format code
black *.py
# Lint code
flake8 *.py- Follow PEP 8 guidelines
- Use type hints where possible
- Add docstrings to functions
- Write unit tests for new features
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset: UCI ML Repository - SMS Spam Collection
- Frameworks: TensorFlow, Keras, FastAPI, scikit-learn
- Libraries: NLTK, pandas, numpy, matplotlib
- UI: Chart.js, Font Awesome, modern CSS
- Transformer Models: BERT/RoBERTa integration
- Multi-language Support: Detect spam in different languages
- Real-time Learning: Online learning capabilities
- Mobile App: React Native/Flutter implementation
- Browser Extension: Chrome/Firefox extension
- Email Integration: Gmail/Outlook plugins
- Advanced Analytics: Detailed reporting dashboard
- A/B Testing: Model performance comparison tools
โญ Star this repository if you found it helpful!
๐ก๏ธ Happy Spam Detecting!