{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **TP6 - Principal Component Analysis (PCA)**\n",
"\n",
"Exploratory Data Analysis & Unsuperivsed Learning
\n",
"**Course: PHAUK Sokkey, PhD**
\n",
"**TP: HAS Sothea, PhD**\n",
"\n",
"-------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Objective**: In this lab, let's dive into an essential unsupervised learning method: Principal Component Analysis (PCA). PCA is a key technique for dimensionality reduction that simplifies data while preserving its crucial patterns. We will explore PCA from multiple perspectives in this TP.\n",
"\n",
"---------\n",
"\n",
"> **The `Jupyter Notebook` for this TP can be downloaded here: [TP6_PCA.ipynb](https://hassothea.github.io/EDA_ITC/TPs/TP6_PCA.ipynb)**.\n",
"\n",
"-------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Analyzing US Crime Dataset with PCA\n",
"\n",
"The `USArrests` data available in `Kaggle` provides statistics on arrests for crime including **rape**, **assault** and **murder** in 50 states of the United States in 1973. \n",
"\n",
"For information, read about the dataset [here](https://www.kaggle.com/datasets/halimedogan/usarrests). We will use PCA to identify which U.S. state was the most dangerous or the safest in 1973.\n",
"\n",
"**A.** Import the data and visualize each column to get a general sense of the dataset."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | Unnamed: 0 | \n", "Murder | \n", "Assault | \n", "UrbanPop | \n", "Rape | \n", "
---|---|---|---|---|---|
0 | \n", "Alabama | \n", "13.2 | \n", "236 | \n", "58 | \n", "21.2 | \n", "
1 | \n", "Alaska | \n", "10.0 | \n", "263 | \n", "48 | \n", "44.5 | \n", "
2 | \n", "Arizona | \n", "8.1 | \n", "294 | \n", "80 | \n", "31.0 | \n", "
3 | \n", "Arkansas | \n", "8.8 | \n", "190 | \n", "50 | \n", "19.5 | \n", "
4 | \n", "California | \n", "9.0 | \n", "276 | \n", "91 | \n", "40.6 | \n", "