A Behavioral Machine Learning System for Modeling User Satisfaction Across Devices, Usage Types, and AI Models
This project implements a complete, production-grade machine learning pipeline for predicting user satisfaction with an AI assistant based on recorded behavioral patterns. It is designed to reflect the kinds of analytics and modeling workflows used inside real AI companies to improve user experience, identify dissatisfaction drivers, optimize product design, and uncover hidden behavioral trends.
Using a synthetic daily interaction dataset of 300 sessions, the system performs:
- Feature engineering from timestamps
- Behavioral signal extraction (device, usage category, model used)
- Supervised multi-class prediction (satisfaction rating 1–5)
- Model explainability using SHAP
- Evaluation through detailed error analysis
- Interactive analytics dashboard built in Streamlit
This repository demonstrates the full lifecycle of a modern ML product:
- Data ingestion
- Feature generation
- Modeling & hyperparameters
- Explainability
- UX-level visualization
- Deployment-ready scoring pipeline
The outcome is a behaviorally interpretable AI satisfaction model capable of shedding light on why users feel satisfied or frustrated when engaging with an AI assistant.
ai-assistant-satisfaction-engine/
│
├── app.py # Streamlit dashboard
│
├── data/
│ ├── raw/
│ │ └── Daily_AI_Assistant_Usage_Behavior_Dataset.csv
│ └── processed/
│ ├── sessions_train.csv
│ └── sessions_test.csv
│
├── models/
│ └── satisfaction_pipeline.joblib # Trained ML model
│
├── reports/
│ ├── metrics/
│ │ ├── metrics.json
│ │ └── classification_report.json
│ └── figures/
│ ├── confusion_matrix.png
│ ├── satisfaction_distribution.png
│ ├── per_class_f1.png
│ └── shap_summary.png
│
├── src/
│ ├── config.py
│ ├── data_prep.py
│ ├── features.py
│ ├── train_model.py
│ ├── evaluate.py
│ ├── explain.py
│ └── score_new_sessions.py
│
├── requirements.txt
├── LICENSE
└── README.md
The dataset contains 300 interaction sessions with 8 core features:
| Feature | Meaning |
|---|---|
timestamp |
When the session occurred |
device |
Desktop, Mobile, Tablet, Smart Speaker |
usage_category |
Coding, Productivity, Research, Writing, etc. |
prompt_length |
User prompt length |
session_length_minutes |
Engagement duration |
satisfaction_rating |
Target label (1–5) |
assistant_model |
GPT-4o, GPT-5, GPT-5.1, Mini, o1 |
tokens_used |
Tokens used by AI response |
From timestamp, the pipeline extracts:
hour_of_dayday_of_week(0 = Monday)is_weekend(binary)
These features significantly impact user behavior and satisfaction (e.g., weekend sessions tend to have higher satisfaction).
The system uses a modular, reusable ML pipeline built with:
- ColumnTransformer for preprocessing
- OneHotEncoder for categorical features
- StandardScaler for numerical features
- RandomForestClassifier with class balancing
- SHAP explainability layer
- Evaluation scripts
flowchart TD
A[Raw Dataset] --> B[Data Preprocessing]
B --> C[Feature Engineering<br>time features, encoding]
C --> D[Train/Test Split]
D --> E[ML Pipeline<br>Preprocessor + RandomForest]
E --> F[Model Training]
F --> G[Evaluation<br>accuracy, F1, confusion matrix]
F --> H[Explainability<br>SHAP]
F --> I[Model Serialization<br>.joblib]
I --> J[Streamlit Dashboard]
J --> K[Session Scoring & Exploration]
K --> L[Per-session SHAP Explanations]
Training is conducted using:
python -m src.train_modelThis produces metrics such as:
- Overall accuracy
- Per-class precision, recall, and F1
- Full classification report
- Serialized model pipeline
- The model tends to confuse adjacent satisfaction levels (1→2, 3→4), which is expected in ordinal regression-like classification.
- It never jumps from very low to very high satisfaction.
- Shows the model has learned relative satisfaction ordering even if exact prediction is challenging.
- Class 5 (high satisfaction) is easiest to predict → behaviorally consistent users.
- Class 1 (low satisfaction) is hardest → dissatisfaction is behaviorally diverse.
- Indicates natural human variability in negative feedback.
- The distribution is well-aligned with the true dataset.
- Class 3 is slightly overpredicted → reflects uncertainty-damping behavior in RandomForest.
- The model is calibrated but conservative.
| Feature | Effect |
|---|---|
device_Smart Speaker |
Strong positive satisfaction |
usage_category_Coding |
Highest satisfaction category |
is_weekend |
Positive emotional bandwidth → higher ratings |
device_Mobile |
Consistently lowers satisfaction |
assistant_model_* |
Quality of model heavily influences satisfaction |
This provides business actionability: prioritize certain devices, optimize mobile UX, and tailor model recommendations.
The model reveals hidden psychological and behavioral patterns:
- Smart Speaker → most positive (relaxed environment)
- Mobile → lowest satisfaction (high interruption environment)
- Coding sessions are structured → high satisfaction
- Creative writing produces variable satisfaction → ambiguous task expectations
- Weekend sessions are significantly more positive.
- Suggests mood & available time impact perception of AI quality.
Better model → higher satisfaction, even when controlling for behavior.
Longer sessions → deeper interaction → more positive evaluations.
Run:
streamlit run app.pyFeatures:
- Upload your own session CSV
- Explore satisfaction distribution
- Filter by device, usage type, or model
- Inspect per-session decisions
- View SHAP explanations interactively
This dashboard mimics internal UX analytics tools used at large AI companies.
Use:
python -m src.score_new_sessions path/to.csvThe system adds:
- Predicted satisfaction
- Probability distribution across all 5 classes
This makes the model suitable for:
- A/B testing
- Real-time inference
- User retention analysis
- Python 3.10+
- pandas, numpy
- scikit-learn
- matplotlib, seaborn
- SHAP
- Streamlit
- Joblib
- Install dependencies
- Run
src.data_prep - Run
src.train_model - Run
src.evaluate - Run
src.explain - Launch Streamlit app
Everything is fully deterministic with a fixed random state.
Even a good model has constraints:
- Small dataset (300 samples) limits generalization
- Satisfaction is subjective → inherently noisy
- Multi-class ordinal classification is challenging
- Time-series modeling not included
- No personalization (user-level data missing)
- Transform satisfaction prediction into ordinal regression
- Add LSTM or Time Series Transformers to capture temporal patterns
- Introduce personalization embeddings
- Add model calibration plots
- Improve dashboard with cohort analysis
- Deploy as FastAPI microservice
This project demonstrates a complete, production-grade approach to modeling AI assistant user satisfaction. It blends behavioral analytics, psychology-informed feature engineering, explainable ML, and interactive visualization to deliver a system that is both technically rigorous and insight-rich.
It stands as a strong portfolio showcase and a realistic foundation for real-world AI UX analytics.