{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **TP3 - Data Preprocessing**\n",
"\n",
"**Exploratory Data Analysis & Unsuperivsed Learning**
\n",
"**M1-DAS**
\n",
"**Lecturer: HAS Sothea, PhD**\n",
"\n",
"-------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Objective:** Preprocessing is important in data related tasks. In this TP, you will explore different challanges you may encounted during when performing data preprocessing. We will discuss reasonable solution to these challanges.\n",
"\n",
"> **The `Jupyter Notebook` for this TP can be downloaded here: [TP3_Preprocessing.ipynb](https://hassothea.github.io/M1_EDA_ITC/TPs/TP3_Preprocessing.ipynb)**.\n",
"\n",
"-----------\n",
"\n",
"## 1. Titanic dataset\n",
"\n",
"The `Titanic` dataset contains information on the passengers aboard the RMS Titanic, which sank in $1912$. It includes details like age, gender, class, and survival status.\n",
"\n",
"I bet you have heard about or watched `Tiannic` movie at least once. How about we take a look at the real dataset of `Titanic` available in Kaggle. For more information about the dataset and the columns, read [`Titanic dataset`](https://www.kaggle.com/datasets/surendhan/titanic-dataset). Let's import it into our Jupyter Notebook by running the following code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "892 | \n", "0 | \n", "3 | \n", "Kelly, Mr. James | \n", "male | \n", "34.5 | \n", "0 | \n", "0 | \n", "330911 | \n", "7.8292 | \n", "NaN | \n", "Q | \n", "
1 | \n", "893 | \n", "1 | \n", "3 | \n", "Wilkes, Mrs. James (Ellen Needs) | \n", "female | \n", "47.0 | \n", "1 | \n", "0 | \n", "363272 | \n", "7.0000 | \n", "NaN | \n", "S | \n", "
2 | \n", "894 | \n", "0 | \n", "2 | \n", "Myles, Mr. Thomas Francis | \n", "male | \n", "62.0 | \n", "0 | \n", "0 | \n", "240276 | \n", "9.6875 | \n", "NaN | \n", "Q | \n", "
3 | \n", "895 | \n", "0 | \n", "3 | \n", "Wirz, Mr. Albert | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "315154 | \n", "8.6625 | \n", "NaN | \n", "S | \n", "
4 | \n", "896 | \n", "1 | \n", "3 | \n", "Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n", "female | \n", "22.0 | \n", "1 | \n", "1 | \n", "3101298 | \n", "12.2875 | \n", "NaN | \n", "S | \n", "