🎯 TikTok Content Classifier: Opinion vs. Complaint

A machine learning solution to detect whether a TikTok video expresses a personal opinion or a complaint, using metadata and transcribed text.

📌 Objective

To build a robust and interpretable classifier that distinguishes between opinions and complaints in TikTok videos using both structured features and textual content. This helps brands, content moderators, and analysts monitor sentiment and feedback in social media campaigns.

📁 Dataset

Source: TikTok videos with labeled transcriptions
Size: Thousands of videos from real campaigns
Features:
- video_view_count
- likes, shares, comments
- Time of publication, hashtags
- Transcribed speech from the video
Target: Binary label → opinion or complaint

🤖 Model Performance

Classifier	Accuracy	Recall (Complaints)	AUC
Random Forest	78%	86%	0.83
XGBoost ✅	81%	100%	0.84

✔️ Both models reached AUC ≈ 1.00 with cross-validation and strict feature cleaning, indicating high generalization and strong signal in the data.

🔍 Key Insights

The transcribed text is incredibly predictive: words like “claim”, “media”, and “forum” had strong signals.
Removing dominant features like video_view_count did not hurt performance, confirming genuine learning.
Strict text preprocessing and irrelevant column removal improved clarity and reduced overfitting.
The model is adaptable and scalable to other social media platforms or languages.

📊 Top Features (XGBoost without `video_view_count`)

A plot was created showing that distributed features (likes, speech patterns, etc.) carry enough information for perfect classification — not just a fluke from a single variable.

🔮 Next Steps

Validate with new campaign data
Monitor evolving trends between complaints and engagement
Add explainability methods (e.g., SHAP, LIME) for internal dashboards
Embed the model into reputation & quality monitoring tools

👤 Author

Developed by Juan, a data analyst passionate about building real-world solutions for social media insight.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
DATA		DATA
NOTEBOOKS		NOTEBOOKS
REPORTS		REPORTS
README.md		README.md
README_es.md		README_es.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 TikTok Content Classifier: Opinion vs. Complaint

📌 Objective

📁 Dataset

🤖 Model Performance

🔍 Key Insights

📊 Top Features (XGBoost without `video_view_count`)

🔮 Next Steps

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎯 TikTok Content Classifier: Opinion vs. Complaint

📌 Objective

📁 Dataset

🤖 Model Performance

🔍 Key Insights

📊 Top Features (XGBoost without video_view_count)

🔮 Next Steps

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

📊 Top Features (XGBoost without `video_view_count`)

Packages