{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **TP8 - Corresponding Analysis (CA)**\n",
"\n",
"- Course: EDA & Unsuperivsed Learning
\n",
"- **M-DAS**
\n",
"- **Lecturer: HAS Sothea, PhD**\n",
"\n",
"-------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Objective**: Qualitative columns are often ignored in predictive models or analysis. It is important to notice that qualitative variables are as important as the quantitative ones when it comes to building predictive models or analyzing their connection within the dataset. In this TP, we will focus on identifying the associations between two qualitative variables.\n",
"\n",
"---------\n",
"\n",
"> **The `Jupyter Notebook` for this TP can be downloaded here: [TP8_CA.ipynb](https://hassothea.github.io/M1_EDA_ITC/TPs/TP8_CA.ipynb)**.\n",
"\n",
"-------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data loading and Preprocessing\n",
"\n",
"In this section, we will work with `Titanic` dataset ([**TP3**](https://hassothea.github.io/M1_EDA_ITC/TPs/TP3_Preprocessing.html)).\n",
"\n",
"**A.** Import the `Titanic` dataset from kaggle using: [Titanic dataset](https://www.kaggle.com/datasets/surendhan/titanic-dataset).\n",
"\n",
"- How many quantitative and qualitative variables are there in this dataset?\n",
"- Convert each column into its correct data type."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import kagglehub\n",
"\n",
"# Download latest version\n",
"path = kagglehub.dataset_download(\"surendhan/titanic-dataset\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "892 | \n", "0 | \n", "3 | \n", "Kelly, Mr. James | \n", "male | \n", "34.5 | \n", "0 | \n", "0 | \n", "330911 | \n", "7.8292 | \n", "NaN | \n", "Q | \n", "
1 | \n", "893 | \n", "1 | \n", "3 | \n", "Wilkes, Mrs. James (Ellen Needs) | \n", "female | \n", "47.0 | \n", "1 | \n", "0 | \n", "363272 | \n", "7.0000 | \n", "NaN | \n", "S | \n", "
2 | \n", "894 | \n", "0 | \n", "2 | \n", "Myles, Mr. Thomas Francis | \n", "male | \n", "62.0 | \n", "0 | \n", "0 | \n", "240276 | \n", "9.6875 | \n", "NaN | \n", "Q | \n", "
3 | \n", "895 | \n", "0 | \n", "3 | \n", "Wirz, Mr. Albert | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "315154 | \n", "8.6625 | \n", "NaN | \n", "S | \n", "
4 | \n", "896 | \n", "1 | \n", "3 | \n", "Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n", "female | \n", "22.0 | \n", "1 | \n", "1 | \n", "3101298 | \n", "12.2875 | \n", "NaN | \n", "S | \n", "