This project implements a comprehensive machine learning pipeline to classify news articles as true or fake. It utilizes traditional NLP techniques combined with multiple classifiers to detect misinformation effectively.
- Text preprocessing: tokenization, lowercasing, punctuation removal
- Feature extraction using
CountVectorizerandTfidfVectorizerwith n-grams - Classification models including:
- Naive Bayes
- Logistic Regression
- Support Vector Machine (SVM)
- Stochastic Gradient Descent (SGD)
- Random Forest
- Passive Aggressive Classifier
- Model evaluation with accuracy, F1-score, and confusion matrices
- K-Fold cross-validation for robustness
- Saving/loading models using pickle files
- Combining multiple models into a single pickle file for easy management
- Interactive command-line testing module with color-coded predictions
- Visualizations including bar charts and dynamic gauge charts (Truth-O-Meter)
- Train models on the dataset or load pre-trained pickle files.
- Use the interactive
classify_news()function to input any news article and get predictions from all models. - Visualize results with helpful charts to understand prediction confidence.
- Python 3.x
- Libraries:
scikit-learn,pandas,numpy,matplotlib,seaborn,plotly,nltk
pip install scikit-learn pandas numpy matplotlib seaborn plotly nltk