Fake News Classifier

Introduction

Exercise to practice Machine Learning and NLP focused on creating a Fake News classifier.

This exercise was a collaboration between Desirée Maestri, Jan Dirk and Jonathan Sada.

Project Overview

The objective of this project is to build a classifier that is able to distinguish between the Fake (0) and Real(1) headline news.

Once a model is trained it must be used to predict the data in the file dataset/testing_data.csv.

Dataset

training_data.csv: Dataset provided to train and test the model.
testing_data.csv: Dataset to provide the answer of the exercise. It contains all labels set as 2. The purpose of the exercise is to replace them by 0 (fake) or 1 (real) according to the model predictions.

Result and Conclusions

The project is splitted in different main files:

fake_news_classifier.ipynb: Contains the step by step of the Machine Learning model creation and comparission (requires and environment with requirements-notebook.txt installed).
fine_tuning.py and fine-tuning-bert.py: Train a fine-tuned model based on bert and takes predictions (requires and environment with requirements-fine_tuning.txt installed)
FakeNewsClassifier.pdf: contains the presentation, technologies used, model comparission, fine-tuned models details and results.

The folder output contains csv with some of the results of the different secctions of the exercise and the final answer of the exercise:

improve_pred_results.csv: Contains the metrcis for training and test for all the combinations of classifiers and vectorizers tested.
n-grams_results.csv: Contains the metrics for training and test for the different combinations of ngrams tested in the 2 best models identified on the previous file.
logisticregression_model_validation.csv: Contains the predictions done by the model selected in previous steps in iterations where words with higher impact in the preictions were removed. This was done to validate the model was not dependant on the words with high coefficients.
pre_trained_models.csv: Contains the predictions taken on different fake news classifiers pre-trained models from Hugging Face.
alternative_data_cleanning.csv: Contais prediction metrics done to compare how different cleaning aproaches affect the model selected.
testing_data.csv: Contains the resuling predictions of our model (the provided dataset with the labels updated)

The folder result contails the predictions done by the bert fine-tunned model

Credits

Pre-Trained models used:

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fake News Classifier

Introduction

Project Overview

Dataset

Result and Conclusions

Credits

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataset		dataset
output		output
results		results
.gitignore		.gitignore
FakeNewsClassifier.pdf		FakeNewsClassifier.pdf
README.md		README.md
fake_news_classifier.ipynb		fake_news_classifier.ipynb
fine-tuning-bert.py		fine-tuning-bert.py
fine_tuning.py		fine_tuning.py
helpers.py		helpers.py
requirements-fine_tuning.txt		requirements-fine_tuning.txt
requirements-notebook.txt		requirements-notebook.txt

jonathansada/Fake_News_Classifier

Folders and files

Latest commit

History

Repository files navigation

Fake News Classifier

Introduction

Project Overview

Dataset

Result and Conclusions

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages