-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathscrap.qmd
More file actions
96 lines (69 loc) · 2.51 KB
/
scrap.qmd
File metadata and controls
96 lines (69 loc) · 2.51 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
title: "Untitled"
format: html
editor: visual
---
## Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.
## Running Code
When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
```{python}
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from formulaic import model_matrix
import statsmodels.api as sm
nhanes = pd.read_csv("classroom_data/NHANES.csv")
nhanes['Hypertension'] = (nhanes['BPDiaAve'] > 80) | (nhanes['BPSysAve'] > 130)
plt.clf()
sns.boxplot(x="Hypertension", y="BMI", data=nhanes)
plt.show()
```
```{python}
y, X = model_matrix("Hypertension ~ BMI", nhanes)
logit_model = sm.Logit(y, X).fit()
```
```{python}
logit_model.summary()
plt.clf()
plt.scatter(X.BMI, logit_model.predict(), color="blue", label="Fitted Line")
plt.scatter(X.BMI, y, alpha=.3, color="brown", label="Data")
plt.axhline(y=0.5, color='r', linestyle='--', label='Prediction Cutoff')
plt.legend();
plt.show()
```
```{python}
from sklearn.metrics import (confusion_matrix, accuracy_score)
prediction_cut = [round(x) for x in logit_model.predict()]
cm = confusion_matrix(y, prediction_cut)
print ("Confusion Matrix : \n", cm)
print('Accuracy = ', accuracy_score(y, prediction_cut))
```
Different cutoff
```{python}
prediction_cut = [1 if x > .1 else 0 for x in logit_model.predict()]
cm = confusion_matrix(y, prediction_cut)
print ("Confusion Matrix : \n", cm)
tn, fp, fn, tp = confusion_matrix(y, prediction_cut).ravel().tolist()
print('Accuracy = ', accuracy_score(y, prediction_cut))
```
```{python}
from sklearn.metrics import roc_curve, roc_auc_score, RocCurveDisplay
y, X = model_matrix("Hypertension ~ BMI + Age", nhanes)
logit_model = sm.Logit(y, X).fit()
logit_model.summary()
prediction_cut = [round(x) for x in logit_model.predict()]
cm = confusion_matrix(y, prediction_cut)
print ("Confusion Matrix : \n", cm)
print('Accuracy = ', accuracy_score(y, prediction_cut))
fpr, tpr, thresholds = roc_curve(y, logit_model.predict())
auc_score = roc_auc_score(y, logit_model.predict())
plt.figure(figsize=(8, 6))
RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=auc_score, estimator_name='Logistic Regression').plot(ax=plt.gca())
plt.title('ROC Curve for Logistic Regression')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.show()
```