# To doTP9 - Dimensional Reduction
Course: Advanced Machine Learning
Lecturer: Sothea HAS, PhD
Objective: Dimensional reduction is useful when dealing with high-dimensional dataset. It can also be used in clustering and data analysis. We will explore its potential in data compression and reconstruction, as well as preprocessing method in predictive models.
- The
notebookof thisTPcan be downloaded here: TP9_Dimensional_Reduction.ipynb.
1. Fashion MNIST Dataset
We revisit the Fashion-MNIST from the previous TP8.
A. Dimensional reduction with PCA Import the dataset into python environment and print the first 12 training items (3 rows and 4 columns) of this dataset with title corresponding to their actual item name (you can find the true label of each item here: https://www.kaggle.com/datasets/zalando-research/fashionmnist).
- Perform reduced/normalized PCA on the training input of this dataset.
- How many dimension would you keep to retain \(90\%\) variation of the data?
- Whatβs the percentage of variance explained by the first two dimensions?
- Visualize the data in 2 dimensional space using
PCA. - Perform clustering algorithm on using all the PCs with accumulated variance of \(80\%\). Analyze the performance of clustering algorithm.
- Test if a DNN with original features is better than the one with only PCs with accumulated variance of \(80\%\) of the total variance?
B. Dimensional reduction with \(t\)-SNE
- Visualize the data in 2 dimensional space using \(t\)-SNE.
- Perform clustering algorithm on these embedded features. Analyze the performance of clustering algorithm.
- Implement some predictive models using the embedded features as inputs to predict the type of items and report their performances on the test data.
# To doC. Dimensional reduction with Johnson-Lindenstrauss Lemma
- Project images onto \(d=2, 5, 10\) dimensional spaces (called them
X_JL2,X_JL5andX_JL10respectively). - Perform clustering algorithm on these projected data. Analyze the performance of clustering algorithm for each case.
- Implement some predictive models using the embedded features as inputs to predict the type of items and report their performances on the test data.
# To doD. Dimensional reduction with Autoencoder
- Bulid autoencoder to encode and reconstruct item images using your own designed architecture.
- Visualize some of the original, embedded and reconstructed images side by side.
- Perform clustering algorithm on the latent images of the network. Analyze the performance of clustering algorithm for each case.
- Implement some select models using the latent encoded images as inputs to predict the type of item images and report their performances.
# To do- Conclude your findings.
References
\(^{\text{π}}\) Hinton and Roweis (2002).
\(^{\text{π}}\) Laurens, \(t\)-SNE Page.
\(^{\text{π}}\) Satellite Images.
\(^{\text{π}}\) van der Maaten and Hinton (2008), Visualizing Data using t-SNE.
\(^{\text{π}}\) Bank et al (2021), Autoencoder.
\(^{\text{π}}\) Umberto Michelucci (2022), An Introduction to Autoencoders.