You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/BookChapters/chapter1.dlog
+10Lines changed: 10 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -87,3 +87,13 @@ found info about 5 exercises
87
87
88
88
*** warning: latex envir \begin{bmatrix} does not work well in Markdown. Stick to \[ ... \], equation, equation*, align, or align* environments in math environments.
89
89
output in chapter1.ipynb
90
+
Translating doconce text in chapter1.do.txt to ipynb
91
+
*** replacing \bm{...} by \boldsymbol{...} (\bm is not supported by MathJax)
92
+
found info about 5 exercises
93
+
94
+
*** warning: latex envir \begin{bmatrix} does not work well in Markdown. Stick to \[ ... \], equation, equation*, align, or align* environments in math environments.
95
+
96
+
*** warning: latex envir \begin{bmatrix} does not work well in Markdown. Stick to \[ ... \], equation, equation*, align, or align* environments in math environments.
97
+
98
+
*** warning: latex envir \begin{bmatrix} does not work well in Markdown. Stick to \[ ... \], equation, equation*, align, or align* environments in math environments.
Copy file name to clipboardExpand all lines: doc/BookChapters/chapter4.do.txt
+32-32Lines changed: 32 additions & 32 deletions
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ independent variables $x_i$. Linear regression resulted in
12
12
analytical expressions for standard ordinary Least Squares or Ridge
13
13
regression (in terms of matrices to invert) for several quantities,
14
14
ranging from the variance and thereby the confidence intervals of the
15
-
optimal parameters $\hat{\beta}$ to the mean squared error. If we can invert
15
+
optimal parameters $\hat{\theta}$ to the mean squared error. If we can invert
16
16
the product of the design matrices, linear regression gives then a
17
17
simple recipe for fitting our data.
18
18
@@ -37,7 +37,7 @@ failure etc.
37
37
Logistic regression will also serve as our stepping stone towards
38
38
neural network algorithms and supervised deep learning. For logistic
39
39
learning, the minimization of the cost function leads to a non-linear
40
-
equation in the parameters $\hat{\beta}$. The optimization of the
40
+
equation in the parameters $\hat{\theta}$. The optimization of the
41
41
problem calls therefore for minimization algorithms. This forms the
42
42
bottle neck of all machine learning algorithms, namely how to find
43
43
reliable minima of a multi-variable function. This leads us to the
@@ -86,11 +86,11 @@ We would then have our
86
86
weighted linear combination, namely
87
87
!bt
88
88
\begin{equation}
89
-
\bm{y} = \bm{X}^T\bm{\beta} + \bm{\epsilon},
89
+
\bm{y} = \bm{X}^T\bm{\theta} + \bm{\epsilon},
90
90
\end{equation}
91
91
!et
92
92
where $\bm{y}$ is a vector representing the possible outcomes, $\bm{X}$ is our
93
-
$n\times p$ design matrix and $\bm{\beta}$ represents our estimators/predictors.
93
+
$n\times p$ design matrix and $\bm{\theta}$ represents our estimators/predictors.
94
94
95
95
96
96
The main problem with our function is that it takes values on the
@@ -186,7 +186,7 @@ We are now trying to find a function $f(y\vert x)$, that is a function which giv
186
186
In standard linear regression with a linear dependence on $x$, we would write this in terms of our model
187
187
!bt
188
188
\[
189
-
f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
189
+
f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
190
190
\]
191
191
!et
192
192
@@ -291,19 +291,19 @@ plt.show()
291
291
292
292
293
293
294
-
We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\beta$ in our fitting of the Sigmoid function, that is we define probabilities
294
+
We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\theta$ in our fitting of the Sigmoid function, that is we define probabilities
This equation is known in statistics as the _cross entropy_. Finally, we note that just as in linear regression,
345
345
in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.
346
346
347
347
348
-
The cross entropy is a convex function of the weights $\bm{\beta}$ and,
348
+
The cross entropy is a convex function of the weights $\bm{\theta}$ and,
349
349
therefore, any local minimizer is a global minimizer.
350
350
351
351
352
352
Minimizing this
353
-
cost function with respect to the two parameters $\beta_0$ and $\beta_1$ we obtain
353
+
cost function with respect to the two parameters $\theta_0$ and $\theta_1$ we obtain
Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors
0 commit comments