# To do
TP5 - Gaussian Mixture Model (GMM) & EM Algorithm
Exploratory Data Analysis & Unsuperivsed Learning
Course: PHAUK Sokkey, PhD
TP: HAS Sothea, PhD
Objective: In this lab, we’ll dive into the fascinating world of Gaussian Mixture Models (GMMs) and the Expectation-Maximization (EM) algorithm, both of which are fundamental concepts of unsupervised machine learning. GMMs can be viewed from various angles, such as density estimation and soft clustering. We’ll explore both perspectives and apply it to image segmentation, laying the groundwork for a broader understanding of generative models.
The
Jupyter Notebook
for this TP can be downloaded here: TP5-GMM_EM.
1. Gaussian Mixture Models
A. Perform GMM using GaussianMixture
from sklearn.mixture
on Iris
dataset using n_components=5
.
Read this documentation and answer the followign questions:
- Print the estimated parameters of each component.
- What does the
score()
do in this module? - Compute this
score
,AIC
andBIC
of the trained GMM.
B. Perform GMM on Iris
data but this time using n_components = 1,2,...,10
.
- Compute the
score, AIC
andBIC
at each number of component. - What is the optimal number of component.
# To do
C. With the optimal number of components from question B, perform GMM on Iris
data using different options of covariance_type
from the list [‘full’, ‘tied’, ‘diag’, ‘spherical’]. In each case, compute the score associated to each variance type. Comment.
# To do
D. Repeat question B and C on a simulated data from the previous TP4
. How is GMM’s result compared to Kmeans or Hierarchical clustering?
# To do
2. EM Algorithm
The EM algorithm is used to estimate the parameters of the GMM, ensuring that the model fits the data as closely as possible by iteratively refining the parameters. It leverages the concept of latent variables (responsibilities) to handle the fact that the actual class labels for the data points are unknown.
This iterative optimization makes GMMs a powerful tool for tasks like clustering and density estimation.
A. Recall the process of EM algorithm in GMM of \(K\) components.
To do
B. 1D EM Algorithm:
- Plot the density of the third column of
Iris
dataset. From this density, what is the number of components? - Write a function
EM1d(x, K=3, max_iter = 100)
that takes 1D data arrayx
, number of componentsK
and the maximum itermation of EM algorithmmax_iter
. The function should return:responsibility matrix
\(\Gamma\),center
andvariance
of all \(K\) components. - Apply your function to the third column of the iris data with \(K=1,2,...,10\).
- Compute
score
,AIC
andBIC
for each \(K\) (you may need to write your own function for that). What is the optimal number of components? - Visualize your estimated density.
# To do
C. Image Segmentation.
- Load any image (not too large resolution or it can be an
Mnist
image from the previous TP). - Reshape it into 1D array, then apply your
EM1d
function (or sklearn GMM function) on that 1D pixel array with your desired number of components. - Assign each pixel to a component and reshape the segmented image back into its original shape.
- Display the original and segmented images side by side. Comment.
# To do