Objective: In this lab, let’s dive into an essential unsupervised learning method: Principal Component Analysis (PCA). PCA is a key technique for dimensionality reduction that simplifies data while preserving its crucial patterns. We will explore PCA from multiple perspectives in this TP.
The Jupyter Notebook for this TP can be downloaded here: TP7_PCA.ipynb.
1. Analyzing US Crime Dataset with PCA
The USArrests data available in Kaggle provides statistics on arrests for crime including rape, assault and murder in 50 states of the United States in 1973.
For information, read about the dataset here. We will use PCA to identify which U.S. state was the most dangerous or the safest in 1973.
A. Import the data and visualize each column to get a general sense of the dataset.