Objective: Given input \(\text{x}_i\in\mathbb{R}^d\), classify if \(y\in\{0,1\}\) (Male or female).
Main idea: classify \(\Leftrightarrow\) identify decision boundary.
Main assumption: Boundary(B) is linear.
Model: Given input \(\text{x}_i\), the chance that it belongs to class \(1\) is given by \[\mathbb{P}(Y_i=1|X=\text{x}_i)=\sigma(\color{blue}{\beta_0}+\sum_{j=1}^d\color{blue}{\beta_j}x_{ij}),\] where \(\color{blue}{\beta_0,\beta_1,\dots,\beta_d}\in\mathbb{R}\) are the key parameters to be estiamted from the data, and \(\sigma(t)=1/(1+e^{-t}),\forall t\geq 0\).
Binary Logistic Regression
Model intuition
Ex: Given \(\text{x}_0=[\text{h}_0,\text{w}_0]\in\mathbb{R}^2,\) for any candidate parameter\(\color{blue}{\vec{\beta}=[\beta_0,\beta_1,\beta_2]}\), \[\color{green}{z_0}=\color{blue}{\beta_0}+\color{blue}{\beta_1}\text{h}_0+\color{blue}{\beta_2}\text{w}_0\text{ is the relative distance from }\text{x}_0\to\text{ Boundary (B)}.\]
That’s to say that
\(\color{green}{z_0}>0\Leftrightarrow \text{x}_0\) is above the boundary.
\(|\color{green}{z_0}|\) is large \(\Leftrightarrow\)\(\text{x}_0\) is far from the bounday.
A good boundary should be such that:
\(|\color{green}{z_0}|\) large \(\Rightarrow\)“certain about its class”.
\(|\color{green}{z_0}|\) small \(\Rightarrow\)“less certain about its class”.
Binary Logistic Regression
Model intuition
A good boundary should be such that:
\(|\color{green}{z_0}|\) large \(\Rightarrow\)“certain about its class”.
\(|\color{green}{z_0}|\) small \(\Rightarrow\)“less certain about its class”.
Interpretation: \(\text{x}_1,\text{x}_2\) are located below the line \((B):-1-x_1+2x_2\) as \(z_1,z_2<0\) and are predicted to be in class \(\color{red}{0}\). On the other hand, \(\text{x}_3\) is located above the line (\(z_3>0\)) and is predicted to be in class \(\color{blue}{1}\).
Q4: Now, how do we find the best key parameter \(\color{blue}{\beta_0,\dots,\beta_d}\)?
We will build a criterion just like RSS in linear regression.
Objective: search for \(\color{blue}{\beta_0}\in\mathbb{R},\color{blue}{\vec{\beta}}\in\mathbb{R}^d\) such that the model is best aligned with the data \({\cal D}\): \[p(y_i|\text{x}_i)\text{ is large for all }i\in\{1,\dots,n\}.\]
Conditional Likelihood Function: If the data are iid, one has \[\begin{align*}{L}(\color{blue}{\beta_0},\color{blue}{\vec{\beta}})&=\mathbb{P}(Y_1=y_1,\dots,Y_n=y_n|X_1=\text{x}_1,\dots,X_n=\text{x}_n)\\
&=\prod_{i=1}^np(y_i|\text{x}_i)\\
&=\prod_{i=1}^n\Big[p(1|\text{x}_i)\Big]^{y_i}\Big[p(0|\text{x}_i)\Big]^{1-y_i}\\
&=\prod_{i=1}^n\Big[\sigma(-\color{blue}{\beta_0}-\text{x}_i^T\color{blue}{\vec{\beta}})\Big]^{y_i}\Big[(1-\sigma(-\color{blue}{\beta_0}-\text{x}_i^T\color{blue}{\vec{\beta}}))\Big]^{1-y_i}.
\end{align*}\]
We search for coefficient \(\color{blue}{\vec{\beta}}=[\color{blue}{\beta_0,\dots,\beta_d}]\) minimizing \[\text{CEn}(\color{blue}{\vec{\beta}})=-\sum_{i=1}^n\Big[y_i\log[\sigma(-\color{blue}{\beta_0}-\text{x}_i^T\color{blue}{\vec{\beta}})]+(1-y_i)\log[(1-\sigma(-\color{blue}{\beta_0}-\text{x}_i^T\color{blue}{\vec{\beta}}))]\Big].\]
😭 Unfortunately, such minimizer values \((\color{blue}{\widehat{\beta}_0,\widehat{\vec{\beta}}})\)CANNOT be analytically computed.
😊 Fortunately, it can be numerically approximated!
We can use optimization algorithms such as Gradient Descent Algorithm to estimate the best \(\color{blue}{\hat{\beta}}\).
For more on Gradient Descent Algorithm for Logistic Regression, read here.
Binary Logistic Regression
Summary
Logistic Regression Model
Main model: \(p(1|\text{x})=1/(1+e^{-\color{green}{z}})=1/(1+e^{-(\color{blue}{\beta_0}+\text{x}^T\color{blue}{\vec{\beta}})})\).
Interpretation:
Boundary decision is Linear defined by the coefficients \(\color{blue}{\beta_0}\) and \(\color{blue}{\vec{\beta}}\).
Probability of being in each class depends on the relative distance of that point to the boundary.
Works well when classes are linearly separable.
Objective: buliding a Logistic Regression model is equivalent to searching for parameters\(\color{blue}{\beta_0}\) and \(\color{blue}{\vec{\beta}}\) that minimizes the Cross-entropy.
The loss cannot be minimized analytically but can be minimized numerically.
Application
Logistic Regression
Application on Auto-MPG
For our Auto-MPG dataset, we aim at predicting origin using some characteristics of the cars.
Build intuition through visualization:
Logistic Regression
Application on Auto-MPG
We predict origin using all quantitative columns.
from sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_split# Building the modelX_train, X_test, y_train, y_test = train_test_split( df_car.select_dtypes(include="number"), df_car[['origin']])lgit = LogisticRegression()lgit = lgit.fit(X_train, y_train)# Predictiony_pred = lgit.predict(X_test)# Accuracyacc = np.mean(y_pred.flatten() == y_test.to_numpy().flatten())
Accuracy = 0.78.
Here, accuracy is defined by \[\text{Accuracy}=\frac{\text{Num. correctly predicted}}{\text{Num. observations}}.\]
Logistic Regression
Summary
We introduce basic concept of Logistic Regression Model: \[p(1|X=\text{x})=\frac{1}{(1+e^{-\color{blue}{\beta_0}-\text{x}^T\color{blue}{\vec{\beta}}})}.\]
The intuition of the model: the probability of being in class \(1\) depends on the relative distance from \(\text{x}\) to a linear boundary defined by \(\color{blue}{[\beta_0,\beta_1,\dots,\beta_d]}\).
The linear boundary assumption may be too weak in practice.
The performance of the model can be improved further by