You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 03-Classification.qmd
+22-22Lines changed: 22 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ ax.set_ylabel('')
49
49
plt.show()
50
50
```
51
51
52
-
Great, there seems to be an association. However, recall that our classification model is going to be making predictions of probability on a continuous scale of 0 to 1 before we classify it into two categories. Therefore, it makes sense to examine the relationship between BMI and empirical Hypertension probability. To do so, we will need to *bin* our data by small chunks of BMI values and calculate the empirical Hypertension probability for that bin. We plot the midpoint binned BMI value vs. empirical Hypertension probability for 20 bins:
52
+
Great, there seems to be an association. However, recall that our classification model is going to be *making predictions of probabilit*y on a continuous scale of 0 to 1 before we classify it into two categories. Therefore, it makes sense to examine the relationship between BMI and empirical Hypertension probability in our data exploration. To do so, we will need to *bin* our data by small chunks of BMI values and calculate the empirical Hypertension probability for that bin. We plot the midpoint binned BMI value vs. empirical Hypertension probability for 20 bins:
However, we need to be mindful of the class imbalance we saw in the dataset at the beginning of the lesson. Recall we roughly have 88% of our data as No Hypertension. If we have a classifier that *always* predicted No Hypertension, then we achieve a 88% accuracy rate, but this model is not particularly novel.
146
+
However, we need to be mindful of the class imbalance we saw in the dataset at the beginning of the lesson. Recall we roughly have 88% of our data as No Hypertension. If we have a classifier that *always* predicted No Hypertension, then we achieve a 88% accuracy rate, but this model is not particularly novel and it raises questions of whether our model of 76% accuracy is novel.
147
147
148
148
We can break down classification accuracy to four additional results, via a table called the **Confusion Matrix**:
149
149
@@ -156,9 +156,9 @@ plt.show()
156
156
157
157
The top left hand corner is the number of True Negatives (1128), the top right hand corner is the number of False Positives (24), the bottom left corner is the number of False Negatives (325), and the bottom right corner is the number of True Positives (15).
158
158
159
-
Our Sensitivity (accuracy of Hypertension events) is defined as: $\frac{TP}{TP+FN}$, which is 15/(15+325) = 4%
159
+
Our **Sensitivity** (accuracy of Hypertension events) is defined as: $\frac{TP}{TP+FN}$, which is 15/(15+325) = 4%
160
160
161
-
Our Specificity (accuracy of No Hypertension events) is defined as: $\frac{TN}{TN+FP}$, which is 1128/(1128+24) = 98%.
161
+
Our **Specificity** (accuracy of No Hypertension events) is defined as: $\frac{TN}{TN+FP}$, which is 1128/(1128+24) = 98%.
162
162
163
163
Therefore, we do a pretty terrible job of predicting the Hypertension cases!
164
164
@@ -176,23 +176,6 @@ disp.plot()
176
176
plt.show()
177
177
```
178
178
179
-
ROC Curve
180
-
181
-
```{python}
182
-
183
-
from sklearn.metrics import RocCurveDisplay
184
-
from sklearn.metrics import roc_curve, roc_auc_score
0 commit comments