-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathOtherFindings.py
More file actions
135 lines (122 loc) · 6.91 KB
/
OtherFindings.py
File metadata and controls
135 lines (122 loc) · 6.91 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
import streamlit as st
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
file = 'KFC.xlsx'
sheet = 'Clean_data'
df = pd.read_excel(file, sheet_name=sheet)
sheet = 'Addon'
df2 = pd.read_excel(file, sheet_name=sheet)
sheet = 'Promotion'
df3 = pd.read_excel(file, sheet_name=sheet)
st.subheader('Max Budget Distribution')
histo = px.histogram(df, x="budget",
category_orders={
"budget": [
"below 100 THB",
"100 - 199 THB",
"200 - 299 THB",
"300 THB and above"
]
}
)
st.plotly_chart(histo)
st.write("""
This chart shows the distribution of customers’ maximum budgets across different ranges. Most customers have a budget between 100 and 199, making it the most common range. The 200–299 range also has a good number of customers, but not as many as the 100–199 group. In comparison, only a few customers have budgets of 300 or more, and even fewer have budgets below 100.
Overall, it can be seen that customers generally prefer a mid-range budget, while very low and very high budgets are less common.
""")
st.divider()
st.subheader('Budget vs Occupation - Boxplot')
fig_boxplot = px.box(
df,
x='occupation',
y='budget_enc',
color='occupation',
category_orders={'occupation': ['student', 'staff', 'ta', 'others']},
labels={'budget_enc': 'Budget', 'occupation': 'Occupation'},
title='Budget Distribution by Occupation'
)
# Fix y axis labels to show text instead of numbers
fig_boxplot.update_yaxes(
tickvals=[1, 2, 3, 4],
ticktext=['Below 100', '100-199', '200-299', '300+']
)
st.plotly_chart(fig_boxplot)
st.write("""
This boxplot shows how budget levels vary across different occupation groups: students, staff, teaching assistants (TA), and others.
For students, the budget values are generally lower, with most values between 200–299 and 100–199. There is some variation, and a higher outlier appears in the 300+ range.
For staff, the budgets are moderate, mostly ranging from 100–199 to 300+, with the median around the middle range, indicating balanced spending.
The TA group generally shows higher budgets, with most values concentrated around 300+. There is a wider spread, including a lower outlier near 200–299.
The others category has a wide variation in budgets, ranging from 200–299 up to 300+, with the median closer to the lower range.
Overall, TAs appear to spend the most, while students and others tend to have lower or more varied budgets.
""")
st.divider()
st.subheader('Ordering Method Histogram')
fig_histogram = px.histogram(df, x="orderMethod")
st.plotly_chart(fig_histogram)
st.write("""
This chart shows the number of orders for each ordering method. Most orders are placed through kiosks, making it the most popular option. The app method also has a high number of orders, but slightly fewer than kiosks. In comparison, the counter method has the lowest number of orders, indicating it is the least preferred option.
Customers tend to favor kiosk and app ordering methods more than the traditional counter method.
""")
st.divider()
st.subheader('Budget by Age Group')
fig_boxplot = px.box(df, x= "age",y="budget")
st.plotly_chart(fig_boxplot)
st.write("""
This boxplot visualizes the distribution of spending budgets across different age demographics, directly reflecting the counts seen in the demographic bar chart. For the 18-22 group, the median budget sits firmly in the 100-199 range, though the wide box indicates significant representation in the 200-299 category as well. In contrast, the 28-35 group shows a higher median and a shift toward the 300+ budget tier. While the under 18 and above 35 groups show much lower overall volume, the boxplot helps highlight that their spending is more tightly clustered in the lower to mid-range budgets compared to the broader spending habits of the young adult segments.
""")
st.divider()
st.subheader('Prefer Promotion Histogram Chart')
fig_histogram = px.histogram(df3,x="Count",y="Promotion")
st.plotly_chart(fig_histogram)
st.write("""
This horizontal bar chart compares the total count of different promotion types to identify
which offers are most popular among customers. Discounts are by far the most preferred
promotion, followed by buy1get1 deals, which also show strong engagement.
Conversely, coupons and app_rewards have much lower counts, and "other"
promotions are negligible. This data suggests that direct price reductions and
multi-buy incentives are the primary drivers for customer participation.
""")
st.divider()
st.subheader('Order Type by Age')
crosstab = pd.crosstab(df['orderType'], df['age'])
fig, ax = plt.subplots()
crosstab.plot(kind='bar', stacked=True, ax=ax)
st.pyplot(fig)
st.write("""
This stacked bar chart displays the preference for different order types across various age groups.
The individual and promotion categories are the most popular choices overall, driven largely
by the 18-22 and 23-27 age demographics. By looking at the segments, it is clear that
the 18-22 group (blue) consistently makes up the largest portion of orders across almost all
types, while categories like "snack_sharing" see very low engagement across all ages.
This visualization helps identify which order styles resonate most with specific generational segments.
""")
st.divider()
st.subheader('Major Distribution')
df_major = df[df['major'] != 'Unknown']
# Count major
major_count = df_major['major'].value_counts().reset_index()
major_count.columns = ['major', 'count']
# Pie chart
fig = px.pie(major_count, names='major', values='count')
st.plotly_chart(fig)
st.write("""
This pie chart illustrates the proportional distribution of students across various academic majors.
It highlights that global_academy represents the vast majority of the population at 74.5%,
followed by international_college at 10.8%. The remaining majors, such as engineering, IT,
and nursing, make up significantly smaller segments, allowing for a quick visual comparison
of which departments hold the highest student concentration.
""")
st.divider()
st.subheader('Correlation Heatmap')
num_cols = ['flavorRating', 'serviceRating', 'age_enc', 'budget_enc', 'visitFrequency_enc']
fig, ax = plt.subplots()
sns.heatmap(df[num_cols].corr(), annot=True, cmap='Reds', ax=ax)
# ax.set_title("Correlation Heatmap")
st.pyplot(fig)
st.write("""
This correlation heatmap illustrates the strength of relationships between different variables, where a value of 1.0 indicates a perfect positive correlation. Notably, there is an extremely strong correlation between age, budget, and visit frequency (ranging from 0.92 to 0.98), suggesting that as age or budget increases, the frequency of visits tends to increase significantly as well. Meanwhile, flavorRating and serviceRating show a moderate positive correlation (0.58), indicating that customers who enjoy the food are likely to have a positive perception of the service, though these ratings remain relatively independent of the demographic and budget factors.
""")
st.divider()