AMSI61AML
Criteria |
Percentage |
---|---|
Attendance | 10% |
Participation & quiz | 10% |
Midterm Exam or/and Project | 15%+15% |
Final Exam | 20% |
Final Project & Presentation | 30% |
“The field of study that gives computers the ability to learn (from data) without being explicitly programmed.”
“A computer program is said to learn
from experience \(E\) with respect to some class of tasks \(T\) and performance measure \(P\) if its performance at tasks in \(T\), as measured by \(P\), improves with experience \(E\).”
ML
is a powerful tool in this era.Week |
Topic |
---|---|
1 | Naive Bayesian Classifier, LDA & QDA |
2 | Logistic Regression & Regularization |
3 | KNN & Decision Trees |
4 | Ensemble Learning |
5 | Model Selection |
6 | Hard Clustering |
7 | Probabilistic Clustering |
8 | Midterm Exam |
9 | Dimensional Reduction: PCA, SVD |
10 | Association Rules |
11 | Markov Decision Process & Q-learning |
12 | Upper Confident Bound (PCB) & Thomson sampling |
13 | Deep Learning |
14 | MLOps |
15-16 | ML for SDG & Future Trends |
17-18 | Final exam & Projects |
Bag of words
of email \(i\).Spam
& \(0=\) Nonspam
.make | address | all | num3d | our | over | remove | internet | order | ... | charSemicolon | charRoundbracket | charSquarebracket | charExclamation | charDollar | charHash | capitalAve | capitalLong | capitalTotal | type | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3588 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | ... | 0.000 | 0.000 | 0.000 | 0.0 | 0.0 | 0.0 | 1.000 | 1 | 5 | nonspam |
3192 | 0.0 | 0.0 | 0.17 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.17 | ... | 0.267 | 0.802 | 0.118 | 0.0 | 0.0 | 0.0 | 4.808 | 20 | 601 | nonspam |
2 rows × 58 columns
make
, address
& capitalTotal
.import matplotlib.pyplot as plt
_, ax = plt.subplots(1, 3, figsize = (12, 3.5))
sns.histplot(data=spam, x="make", ax=ax[0], binwidth=0.2, hue = "type")
ax[0].set_title("Distribution of 'make'")
sns.histplot(data=spam, x="address", ax=ax[1], binwidth=0.5, hue = "type")
ax[1].set_title("Distribution of 'address'")
sns.histplot(data=spam, x="capitalTotal", ax=ax[2], binwidth=700, hue = "type")
ax[2].set_title("Distribution of 'capitalTotal'")
⚠️ Values are too concentrated around \(0\)!
log
scale: too wide range, too dense around \(0\)…_, ax = plt.subplots(1, 3, figsize = (12, 3.5))
sns.histplot(data=spam, x="make", ax=ax[0], binwidth=0.2, hue = "type")
ax[0].set_title("Distribution of 'make'")
ax[0].set_yscale("log")
sns.histplot(data=spam, x="address", ax=ax[1], binwidth=0.5, hue = "type")
ax[1].set_title("Distribution of 'address'")
ax[1].set_yscale("log")
sns.histplot(data=spam, x="capitalTotal", ax=ax[2], binwidth=700, hue = "type")
ax[2].set_title("Distribution of 'capitalTotal'")
ax[2].set_yscale("log")
✅ Better, isn’t it?
make
, address
, capitalTotal
\()\in\mathbb{R}^3\).from scipy.stats import gaussian_kde
import numpy as np
x1 = [1.5, 4.2, 5050]
x2 = [1.5, 4.2, 9000]
# Given Y = 1
ker_make = gaussian_kde(spam.make[spam.type=="spam"])
ker_address = gaussian_kde(spam.address[spam.type=="spam"])
ker_capital = gaussian_kde(spam.capitalTotal[spam.type=="spam"])
den11 = [ker_make(x1[0]), ker_address(x1[1]), ker_capital(x1[2])]
den12 = [ker_make(x2[0]), ker_address(x2[1]), ker_capital(x2[2])]
n_spam = np.sum(spam.type=="spam")
n_non = spam.shape[0] - n_spam
pro11 = n_spam/spam.shape[0] * np.prod(den11)
pro12 = n_spam/spam.shape[0] * np.prod(den12)
# Given Y = 0
ker_make = gaussian_kde(spam.make[spam.type=="nonspam"])
ker_address = gaussian_kde(spam.address[spam.type=="nonspam"])
ker_capital = gaussian_kde(spam.capitalTotal[spam.type=="nonspam"])
den01 = [ker_make(x1[0]), ker_address(x1[1]), ker_capital(x1[2])]
den02 = [ker_make(x2[0]), ker_address(x2[1]), ker_capital(x2[2])]
pro01 = n_non/spam.shape[0] * np.prod(den01)
pro02 = n_non/spam.shape[0] * np.prod(den02)
pro01, pro11 = pro01/(pro01+pro11), pro11/(pro01+pro11)
pro02, pro12 = pro02/(pro02+pro12), pro12/(pro02+pro12)
ax[0].vlines([x1[0]], ymin=[0], ymax=[3000], color='black', linestyle = "dashed")
ax[1].vlines([x1[1]], ymin=[0], ymax=[3000], color='black', linestyle = "dashed")
ax[2].vlines([x1[2], x2[2]], ymin=[0], ymax=[3000], color=['red', 'blue'], linestyle = "dashed")
display(fig)
Bayes’s Theorem
For any two events \(E,H\) with \(\mathbb{P}(E)>0,\) one has \[\begin{equation}\overbrace{\mathbb{P}(H|E)}^{\text{Posterior}}=\frac{\overbrace{\mathbb{P}(E|H)}^{\text{Likelihood}}\times\overbrace{\mathbb{P}(H)}^{\text{Prior}}}{\underbrace{\mathbb{P}(E)}_{\text{Marginal}}}.\end{equation}\]
For any email \(\text{x}=(x_1,\dots, x_d)\): \[\mathbb{P}(Y=1|X=\text{x})=\frac{\mathbb{P}^{\small\text{📚}}(X=\text{x}|Y=1)\times\mathbb{P}(Y=1)}{\mathbb{P}(X=\text{x})}.\]
\(\mathbb{P}(Y=1|X=\text{x})^{\small\text{📚}}\) allows us to classify email \(x\): \[\text{Email x} \text{ is a }\begin{cases}\text{spam}& \mbox{if }\mathbb{P}(Y=1|X=\text{x})\geq\delta\\ \text{nonspam}& \mbox{if }\mathbb{P}(Y=1|X=\text{x})<\delta\end{cases}\] for some \(\delta\in (0,1)\). A common choice is \(\delta=0.5\).
Key quantities in classification
\[\mathbb{P}(Y=1|X=\text{x})\propto \color{red}{\mathbb{P}(X=\text{x}|Y=1)}\times\color{green}{\mathbb{P}(Y=1)}.\]
\(\color{green}{\mathbb{P}(Y=1)}\) can be estimated by \(\frac{n(\text{spams})}{n(\text{emails})}\) ✅
\(\color{red}{\mathbb{P}(X=\text{x}|Y=1)}\) is more complicated (key to different models) ⚠️
Main Assumption of Naive Bayes
Within any class \(k\in\{1,0\}\), the components of \(X|Y=k\) are indpendent i.e., \[\color{red}{\mathbb{P}(X=\text{x}|Y=k)}=\prod_{j=1}^d\mathbb{P}(X_j=x_j|Y=k).\]
Key quantities in Naive Bayes
\[\mathbb{P}(Y=1|X=\text{x})\propto \color{green}{\mathbb{P}(Y=1)}\color{red}{\prod_{j=1}^d\mathbb{P}(X_j=x_j|Y=1)},\]
Type of \(X_j\) | Distribution | Graphic |
---|---|---|
Qualitative | Bernoulli, Multinomial… | barplot , countplot |
Quantitative | Gausian, Exponential… | displot , hist , density … |
Key quantity
\[\mathbb{P}(Y=k|X=\text{x})\propto\mathbb{P}(Y=k)\prod_{j=1}^d\mathbb{P}(X_j=x_j|Y=k).\]
Classification rule
\[x\text{ belongs to class }k^*\text{ if }\mathbb{P}(Y=k^*|X=x)=\max_{1\leq k\leq M}\mathbb{P}(Y=k|X=\text{x}).\]
Pros
independence
is violeted.Cons
independence
assumption.Spam
datasetfrom sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
sns.set(style="white")
X_train1, X_test1, y_train1, y_test1 = train_test_split(spam.iloc[:,:57], spam.iloc[:,57], test_size = 0.2, random_state=42)
nb1 = GaussianNB()
nb1 = nb1.fit(X_train1, y_train1)
pred1 = nb1.predict(X_test1)
conf1 = confusion_matrix(pred1, y_test1)
con_fig1 = ConfusionMatrixDisplay(conf1)
make
, address
, capitalTotal
\()\in\mathbb{R}^3\).X_train2, X_test2, y_train2, y_test2 = train_test_split(spam[["make","address", "capitalTotal"]], spam.iloc[:,57], test_size = 0.2, random_state=42)
nb2 = GaussianNB()
nb2 = nb1.fit(X_train2, y_train2)
pred2 = nb2.predict(X_test2)
conf2 = confusion_matrix(pred2, y_test2)
con_fig2 = ConfusionMatrixDisplay(conf2)
nonspam
, always guessing nonspam
gives \(0.95\) accuracy!Accuracy
isn’t the right metric for imbalanced
data ⚠️import plotly.graph_objects as go
x = np.linspace(0,1,20)
y = np.linspace(0,1,20)
z1 = [[2*x[i]*y[j]/(x[i]+y[j]) for j in range(len(y))] for i in range(len(x))]
z2 = [[(x[i]+y[j])/2 for j in range(len(y))] for i in range(len(x))]
camera = dict(
eye=dict(x=1.7, y=-1.2, z=1.2)
)
fig = go.Figure(go.Surface(x = x,
y = y,
z = z1,
name = "F1-score",
colorscale = "Blues",
showscale = False))
fig.add_trace(go.Surface(x = x,
y = y,
z = z2,
name = "Mean",
colorscale = "Electric",
showscale = False))
fig.update_layout(scene = dict(
xaxis_title='Precision',
yaxis_title='Recall',
zaxis_title='Scores'),
title = dict(text="F1-score vs Mean",
y=0.9,
x=0.5,
font=dict(size = 30,
color = "#1C66B5")
),
scene_camera=camera,
width = 560,
height = 500)
fig.show()
\(\bullet\) ROC \(=\{(\)FPR\(_{\delta}\),TPR\(_{\delta}):\delta\in[0,1]\}\).
\(\bullet\) Better model = Larger Area Under the Curve (AUC).
from plot_metric.functions import BinaryClassification
from plotly.tools import mpl_to_plotly
# Visualisation with plot_metric
y1 = 1*(y_test1 == "spam")
bc1 = BinaryClassification(y1, pr1, labels=["nonspam", "spam"], seaborn_style="whitegrid")
bc2 = BinaryClassification(y1, pr2, labels=["nonspam", "spam"], seaborn_style="whitegrid")
# Figures
a = bc1.plot_roc_curve()
fig_full = plt.gcf()
pl_full = mpl_to_plotly(fig_full)
pl_full.update_layout(width=500, height=450,
title=dict(text="ROC Curve of Full model",
font=dict(size=25)),
xaxis_title = dict(font=dict(size=20, color = "red")),
yaxis_title = dict(text='True Positive Rate (Recall)', font=dict(size=20, color = "#EBB31D")),
template='plotly_white')
pl_full.show()
\(\bullet\) ROC \(=\{(\)FPR\(_{\delta}\),TPR\(_{\delta}):\delta\in[0,1]\}\).
\(\bullet\) Better model = Larger Area Under the Curve (AUC).
bc2 = BinaryClassification(y1, pr2, labels=["nonspam", "spam"], seaborn_style="whitegrid")
# Figures
b = bc2.plot_roc_curve()
fig_3 = plt.gcf()
pl_3 = mpl_to_plotly(fig_3)
pl_3.update_layout(width=500, height=450,
title=dict(text="ROC Curve of 3-input model",
font=dict(size=25)),
xaxis_title = dict(font=dict(size=20, color = "red")),
yaxis_title = dict(text='True Positive Rate (Recall)', font=dict(size=20, color = "#EBB31D")),
template='plotly_white')
pl_3.show()
Confusion matrix
ROC Curve & AUC
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objs as go
# define means and covaraince matrix
mu1, Sigma1 = 0, 3
# Simulate points
x1 = np.random.normal(mu1, Sigma1, 100)
# Plot points
fig = go.Figure(go.Scatter(x=x1,
y=[0 for i in range(len(x1))],
mode = "markers",
name = "Points/Observations",
showlegend = True,
marker = dict(size = 6)))
# Density
def density_gaussian1d(x, mu, sigma):
return 1/((2*np.pi*sigma ** 2) ** (1/2)) * np.exp(-1/2 * (x-mu) ** 2/ sigma ** 2)
x = np.linspace(-10, 10, 50)
y1 = np.array([density_gaussian1d(xi, mu1, Sigma1) for xi in x])
fig.add_trace(
go.Line(x=x,
y =y1,
name = "Density/Possibility",
showlegend = True))
fig.update_layout(title = dict(text="1D Gaussian Random Variables",
x = 0.5,
y = 0.98,
font=dict(size = 20,
color = "#1C66B5")),
width = 450,
height = 430,
yaxis_title=dict(text='Density'),
legend=dict(
yanchor="top",
y=0.95,
xanchor="left",
x=0.01))
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objs as go
# define means and covaraince matrix
mu1, Sigma1 = [0, 0], [[1, 0.5], [0.5, 3]]
# Simulate points
x1 = np.random.multivariate_normal(mean=mu1, cov=Sigma1, size=100)
# Plot points
fig = go.Figure(go.Scatter3d(x=x1[:,0],
y=x1[:,1],
z=[0 for i in range(x1.shape[0])],
mode = "markers",
name = "Points/Observations",
showlegend = True,
marker = dict(size = 4)))
# Simulate points
# Density
def density_gaussian2d(x, mu, Sigma):
return 1/((2*np.pi) ** (len(x)/2) * np.sqrt(np.linalg.det(Sigma))) * np.exp(-1/2 * np.dot(np.dot(x-mu, np.linalg.inv(Sigma)), x-mu))
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
f1 = np.array([[density_gaussian2d(np.array([xi,yi]), mu1, Sigma1) for xi in x] for yi in y])
fig.add_trace(
go.Surface(x=x,
y=y,
z=f1,
name = "Density/Possibility",
opacity=0.3,
showlegend = True,
showscale=False))
fig.update_layout(title = dict(text="2D Gaussian Random Variables",
x = 0.5,
y = 0.98,
font=dict(size = 20,
color = "#1C66B5")),
width = 400,
height = 430,
legend=dict(
yanchor="top",
xanchor="center",
y = 0.9,
x = 0.5
))
fig.update_scenes(zaxis_title_text='Density')
# define means and covaraince matrix
mu1, Sigma1 = [0, 0], [[1, 0], [0, 3]]
mu2, Sigma2 = [5, 5], [[3, -1], [-1, 3]]
mu3, Sigma3 = [-2, 6], [[3, 1.5], [1.5, 1]]
mu4, Sigma4 = [6, 0], [[3, 0.1], [0.1, 0.25]]
x1 = np.random.multivariate_normal(mean=mu1, cov=Sigma1, size=300)
x2 = np.random.multivariate_normal(mean=mu2, cov=Sigma2, size=300)
x3 = np.random.multivariate_normal(mean=mu3, cov=Sigma3, size=300)
x4 = np.random.multivariate_normal(mean=mu4, cov=Sigma4, size=300)
# Save data for later
df_qda = pd.DataFrame({
"x1" : np.concatenate([x1[:,0], x2[:,0], x3[:,0], x4[:,0]]),
"x2" : np.concatenate([x1[:,1], x2[:,1], x3[:,1], x4[:,1]]),
"y" : np.repeat([1,2,3,4], 300)
})
# Plot points
fig0 = go.Figure(go.Scatter3d(x=x1[:,0],
y = x1[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 1",
showlegend = True,
marker = dict(size = 3)))
fig0.add_trace(
go.Scatter3d(x=x2[:,0],
y = x2[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 2",
showlegend = True,
marker = dict(size = 3)))
fig0.add_trace(
go.Scatter3d(x=x3[:,0],
y = x3[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 3",
showlegend = True,
marker = dict(size = 3)))
fig0.add_trace(
go.Scatter3d(x=x4[:,0],
y = x4[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 4",
showlegend = True,
marker = dict(size = 3)))
# Density
def density_gaussian2d(x, mu, Sigma):
return 1/((2*np.pi) ** (len(x)/2) * np.sqrt(np.linalg.det(Sigma))) * np.exp(-1/2 * np.dot(np.dot(x-mu, np.linalg.inv(Sigma)), x-mu))
x = np.linspace(-10, 15, 50)
y = np.linspace(-5, 12, 50)
f1 = np.array([[density_gaussian2d(np.array([xi,yi]), mu1, Sigma1) for xi in x] for yi in y])
f2 = np.array([[density_gaussian2d(np.array([xi,yi]), mu2, Sigma2) for xi in x] for yi in y])
f3 = np.array([[density_gaussian2d(np.array([xi,yi]), mu3, Sigma3) for xi in x] for yi in y])
f4 = np.array([[density_gaussian2d(np.array([xi,yi]), mu4, Sigma4) for xi in x] for yi in y])
fig0.add_trace(
go.Surface(x = x,
y = y,
z = f1,
name = "Density 1",
showlegend = True,
opacity=0.5,
showscale=False))
fig0.add_trace(
go.Surface(x=x,
y =y,
z=f2,
name = "Density 2",
opacity=0.5,
showlegend = True,
showscale=False))
fig0.add_trace(
go.Surface(x=x,
y =y,
z=f3,
name = "Density 3",
opacity=0.5,
showlegend = True,
showscale=False))
fig0.add_trace(
go.Surface(x=x,
y =y,
z=f4,
name = "Density 4",
showlegend = True,
opacity=0.5,
showscale=False))
camera = dict(
eye=dict(x=0, y=-1.2, z=1.5)
)
fig0.update_layout(title = dict(text="Gaussian models",
x = 0.4,
y = 0.9,
font=dict(size = 20,
color = "#1C66B5")),
scene_camera=camera,
width = 420,
height = 510)
fig0.show()
Recall key quantities
\[\mathbb{P}(Y=k|X=\text{x})\propto \color{red}{\mathbb{P}(X=\text{x}|Y=k)}\times\color{green}{\mathbb{P}(Y=k)}.\]
Main Assumption of QDA
For any class \(k\in\{1,\dots,M\}\): \[\color{red}{\mathbb{P}(X=\text{x}|Y=k)}={\cal N}_d(\mu_k,\Sigma_k),\] for some \(\mu_k\in\mathbb{R}^d\) and \(d\times d\)-matrix \(\Sigma\) (to be estimated).
🔑 Within any class, the shape of input \(X\) is assumed to be Gaussian.
Goal: Search for class \(k\) such that
\[\begin{align*} \mathbb{P}(Y=k|X=\text{x})=&\arg\max_{1\leq m\leq M}\mathbb{P}(Y=m|X=\text{x})\\ =&\arg\max_{1\leq m\leq M} \mathbb{P}(Y=m)\mathbb{P}(X=\text{x}|Y=m)\\ =&\arg\max_{1\leq m\leq M} \log\left(\mathbb{P}(Y=m)\mathbb{P}(X=\text{x}|Y=m)\right)\\ =&\arg\max_{1\leq m\leq M} \delta_m(\text{x}), \end{align*}\] where \(\delta_m(\text{x})\) measures the association of the input \(\text{x}\) to the class \(k\) and is defined by \[\delta_m(\text{x})=\color{green}{\log(\pi_m)}-\color{blue}{\log(|\Sigma_m|)}-\color{red}{\frac{1}{2}(\text{x}-\mu_m)^t\Sigma_m^{-1}(\text{x}-\mu_m)}\] with \(\pi_m=\mathbb{P}(Y=m), \mu_m\in\mathbb{R}^d\) and matrix \(\Sigma_m\) to be estimated \(\forall m=1,\dots,M\).
Definition: Boundary Decision
Boundary decision of 2 classes \(k\) and \(j\) is the set of inputs \(\text{x}\in\mathbb{R}^d\) satisfying: \[\mathbb{P}(Y=k|X=\text{x})=\mathbb{P}(Y=j|X=\text{x}).\]
Boundary Decision of QDA
Quadratic Form
of \(\text{x}: \text{x}^tA\text{x}+v^t\text{x}+c=0\), where \(A\) is a \(d\times d\) symmetric matrix, \(v\in\mathbb{R}^d\) and \(c\in\mathbb{R}\) and depend on \(\pi_k,\pi_j,\mu_k,\mu_j,\Sigma_k\) and \(\Sigma_j\).Implementation\(^{\text{📚}}\)
for
\(k\) in range(M)
estimate:
return
\(k\) with the largest value of \(\delta_k(\text{x})\).Main Assumption of LDA
In LDA, we assume that the input \(X\) has the same covariance matrix within all classes, i.e., \[\Sigma_k=\Sigma, \forall k=1,\dots,M.\] In other words, \(X\) has the same shape for all classes.
# define means and covaraince matrix
mu1, Sigma1 = [0, 0], [[2, 0], [0, 2]]
mu2, Sigma2 = [5, 5], [[3, 0], [0, 3]]
mu3, Sigma3 = [-2, 6], [[1, 0], [0, 1]]
mu4, Sigma4 = [6, 0], [[1.75, 0], [0, 1.75]]
x1 = np.random.multivariate_normal(mean=mu1, cov=Sigma1, size=300)
x2 = np.random.multivariate_normal(mean=mu2, cov=Sigma2, size=300)
x3 = np.random.multivariate_normal(mean=mu3, cov=Sigma3, size=300)
x4 = np.random.multivariate_normal(mean=mu4, cov=Sigma4, size=300)
# Save data for later
df_lda = pd.DataFrame({
"x1" : np.concatenate([x1[:,0], x2[:,0], x3[:,0], x4[:,0]]),
"x2" : np.concatenate([x1[:,1], x2[:,1], x3[:,1], x4[:,1]]),
"y" : np.repeat([1,2,3,4], 300)
})
# Plot points
fig1 = go.Figure(go.Scatter3d(x=x1[:,0],
y = x1[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 1",
showlegend = True,
marker = dict(size = 3)))
fig1.add_trace(
go.Scatter3d(x=x2[:,0],
y = x2[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 2",
showlegend = True,
marker = dict(size = 3)))
fig1.add_trace(
go.Scatter3d(x=x3[:,0],
y = x3[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 3",
showlegend = True,
marker = dict(size = 3)))
fig1.add_trace(
go.Scatter3d(x=x4[:,0],
y = x4[:,1],
z = [0] * len(x1[:,0]),
mode = "markers",
name = "Case 4",
showlegend = True,
marker = dict(size = 2)))
# Density
def density_gaussian2d(x, mu, Sigma):
return 1/((2*np.pi) ** (len(x)/2) * np.sqrt(np.linalg.det(Sigma))) * np.exp(-1/2 * np.dot(np.dot(x-mu, np.linalg.inv(Sigma)), x-mu))
x = np.linspace(-5, 10, 50)
y = np.linspace(-5, 10, 50)
f1 = np.array([[density_gaussian2d(np.array([xi,yi]), mu1, Sigma1) for xi in x] for yi in y])
f2 = np.array([[density_gaussian2d(np.array([xi,yi]), mu2, Sigma2) for xi in x] for yi in y])
f3 = np.array([[density_gaussian2d(np.array([xi,yi]), mu3, Sigma3) for xi in x] for yi in y])
f4 = np.array([[density_gaussian2d(np.array([xi,yi]), mu4, Sigma4) for xi in x] for yi in y])
fig1.add_trace(
go.Surface(x = x,
y = y,
z = f1,
name = "Density 1",
showlegend = True,
opacity=0.5,
showscale=False))
fig1.add_trace(
go.Surface(x=x,
y =y,
z=f2,
name = "Density 2",
opacity=0.5,
showlegend = True,
showscale=False))
fig1.add_trace(
go.Surface(x=x,
y =y,
z=f3,
name = "Density 3",
opacity=0.5,
showlegend = True,
showscale=False))
fig1.add_trace(
go.Surface(x=x,
y =y,
z=f4,
name = "Density 4",
showlegend = True,
opacity=0.5,
showscale=False))
camera = dict(
eye=dict(x=0, y=-1.2, z=1.5)
)
fig1 = fig1.update_layout(
title = dict(text=r'$\Sigma_k\text{ in LDA}$',
y=0.9,
x=0.25,
font=dict(size = 20,
color = "#1C66B5")
),
scene_camera=camera,
width = 320,
height = 250)
import ipywidgets as ipw
fig0.update_layout(title = dict(text=r'$\Sigma_k\text{ in QDA}$',
y=0.9,
x=0.25,
font=dict(size = 20,
color = "#1C66B5")
),
scene_camera=camera,
width = 320,
height = 250)
fig0 = go.FigureWidget(fig0)
fig1 = go.FigureWidget(fig1)
ipw.HBox([fig0, fig1])
Discriminant Function
Boundary Decision of LDA
Linear Form
of \(\text{x}\): \[\text{x}^tv+c=0,\] where \(v\in\mathbb{R}^d\) and \(c\in\mathbb{R}\) and depend on \(\pi_k,\pi_j,\mu_k,\mu_j\) and \(\Sigma\).import matplotlib as mpl
from matplotlib import colors
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
# Functions for ellipses and boudary decision
def plot_ellipse(mean, cov, color, ax):
v, w = np.linalg.eigh(cov)
u = w[0] / np.linalg.norm(w[0])
angle = np.arctan(u[1] / u[0])
angle = 180 * angle / np.pi # convert to degrees
# filled Gaussian at 2 standard deviation
ell = mpl.patches.Ellipse(
mean,
2 * v[0] ** 0.5,
2 * v[1] ** 0.5,
angle=180 + angle,
facecolor=color,
edgecolor="black",
linewidth=2,
)
ell.set_clip_box(ax.bbox)
ell.set_alpha(0.4)
ax.add_artist(ell)
def plot_result(estimator, X, y, ax):
cmap = colors.ListedColormap(["tab:red", "tab:blue"])
DecisionBoundaryDisplay.from_estimator(
estimator,
X,
response_method="predict_proba",
plot_method="pcolormesh",
ax=ax,
cmap="RdBu",
alpha=0.3,
)
DecisionBoundaryDisplay.from_estimator(
estimator,
X,
response_method="predict_proba",
plot_method="contour",
ax=ax,
alpha=1.0,
levels=[0.5],
)
y_pred = estimator.predict(X)
X_right, y_right = X[y == y_pred], y[y == y_pred]
X_wrong, y_wrong = X[y != y_pred], y[y != y_pred]
ax.scatter(X_right[:, 0], X_right[:, 1], c=y_right, s=20, cmap=cmap, alpha=0.5)
ax.scatter(
X_wrong[:, 0],
X_wrong[:, 1],
c=y_wrong,
s=30,
cmap=cmap,
alpha=0.9,
marker="x",
)
ax.scatter(
estimator.means_[:, 0],
estimator.means_[:, 1],
c="yellow",
s=200,
marker="*",
edgecolor="black",
)
if isinstance(estimator, LDA):
covariance = [estimator.covariance_] * 2
else:
covariance = estimator.covariance_
plot_ellipse(estimator.means_[0], covariance[0], "tab:red", ax)
plot_ellipse(estimator.means_[1], covariance[1], "tab:blue", ax)
ax.set_box_aspect(1)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.set(xticks=[], yticks=[])
fig, axs = plt.subplots(nrows=1, ncols=2, sharex="row", sharey="row", figsize=(8, 12))
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
lda = LDA(solver="svd", store_covariance=True)
qda = QDA(store_covariance=True)
for ax_row, X, y in zip(
(axs,),
(df_qda[["x1", "x2"]].to_numpy()[:600,:], ),
(df_qda['y'].to_numpy()[:600], ),
):
lda.fit(X, y)
plot_result(lda, X, y, ax_row[0])
qda.fit(X, y)
plot_result(qda, X, y, ax_row[1])
axs[0].set_title("Boundary decision of LDA")
axs[1].set_title("Boundary decision of QDA")
plt.show()
Implementation\(^{\text{📚}}\)
for
\(k\) in range(M)
estimate:
return
\(k\) with the largest value of \(\delta_k(\text{x})\).Regularized DA (RDA) is about regularizing the convariance matrices\(^{\text{📚}}\)
mu1, Sigma1 = [0, 0], np.array([[1, 0], [0, 3]])
mu2, Sigma2 = [5, 5], np.array([[3, -1], [-1, 3]])
mu3, Sigma3 = [-2, 6], np.array([[3, 1.5], [1.5, 1]])
mu4, Sigma4 = [6, 0], np.array([[3, 0.1], [0.1, 0.25]])
df_alpha = np.row_stack([np.random.multivariate_normal(mean=mu1,
cov=Sigma1,
size=100),
np.random.multivariate_normal(mean=mu2,
cov=Sigma2,
size=100),
np.random.multivariate_normal(mean=mu3,
cov=Sigma3,
size=100),
np.random.multivariate_normal(mean=mu4,
cov=Sigma4,
size=100)])
df0 = df_alpha.copy()
Sigma0 = np.array([[1.25, 0], [0, 1.25]])
df_final = np.row_stack([np.random.multivariate_normal(mean=mu1,
cov=Sigma0,
size=100),
np.random.multivariate_normal(mean=mu2,
cov=Sigma0,
size=100),
np.random.multivariate_normal(mean=mu3,
cov=Sigma0,
size=100),
np.random.multivariate_normal(mean=mu4,
cov=Sigma0,
size=100)])
alpha_list = np.linspace(0, 1, 15)
alphas = [0] * 400
classes = np.repeat([int(i) for i in range(1,5)], 100)
for alpha in alpha_list[1:]:
temp = df0 + alpha * (df_final - df0)
df_alpha = np.row_stack([df_alpha, temp])
alphas = np.concatenate((alphas, [alpha] * 400))
classes = np.concatenate((classes, np.repeat([int(i) for i in range(1,5)], 100)))
df_alpha = np.column_stack([df_alpha, alphas, classes])
df_alpha = pd.DataFrame(df_alpha)
df_alpha.columns = ['x1', 'x2', 'alpha', 'class']
df_alpha['alpha'] = df_alpha['alpha'].round(3)
df_alpha['class'] = df_alpha['class'].astype(int)
df_alpha['class'] = df_alpha['class'].astype(str)
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import plotly.express as px
df_alpha["dummy_size"] = 1
fig = px.scatter(df_alpha, x="x1", y="x2", animation_frame="alpha", color="class", hover_data='class', size="dummy_size", size_max = 10)
fig.update_layout(title = dict(text=r'Transition of covariance',
y=1,
x=0.2,
font=dict(size = 30,
color = "#1C66B5")
),
width=500, height = 520,
transition = {'duration': 500})
fig.show()
Right covariances \(\approx\) right shapes!
\(\alpha\in[0,1]\) balances the trade-off between the covariance \(\Sigma_k\) (QDA) and the common variance \(\Sigma\) (LDA): \[\Sigma_k(\alpha)=\alpha\Sigma+(1-\alpha)\Sigma_k.\]
In practice, \(\alpha\) is tuned using cross-validation
technique (later).
Normality
of inputs within each class: \[\mathbb{P}(X=x|Y=k)={\cal N}(\mu_k,\Sigma_k)\]Spam
datasetsns.set(style="white")
# Build LDA object & predict
lda = LDA(solver="svd", store_covariance=True)
lda1 = lda.fit(X_train1, y_train1)
pred1_lda = lda1.predict(X_test1)
conf1_lda = confusion_matrix(pred1_lda, y_test1)
con_fig1_lda = ConfusionMatrixDisplay(conf1_lda)
qda = QDA(store_covariance=True)
qda1 = qda.fit(X_train1, y_train1)
pred1_qda = qda1.predict(X_test1)
conf1_qda = confusion_matrix(pred1_qda, y_test1)
con_fig1_qda = ConfusionMatrixDisplay(conf1_qda)
make
, address
, capitalTotal
\()\in\mathbb{R}^3\).lda = LDA(solver="svd", store_covariance=True)
lda2 = lda.fit(X_train2, y_train2)
pred2_lda = lda2.predict(X_test2)
conf2_lda = confusion_matrix(pred2_lda, y_test2)
con_fig2_lda = ConfusionMatrixDisplay(conf2_lda)
qda = QDA(store_covariance=True)
qda2 = qda.fit(X_train2, y_train2)
pred2_qda = qda2.predict(X_test2)
conf2_qda = confusion_matrix(pred2_qda, y_test2)
con_fig2_qda = ConfusionMatrixDisplay(conf2_qda)
Coming in the TP 😁!
Pros
Cons