{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Lab1: Introduction to Data Analysis**\n",
    "\n",
    "**Course**: **INF-604: Data Analysis** <br>\n",
    "**Lecturer**: **Sothea HAS, PhD**\n",
    "\n",
    "-----\n",
    "\n",
    "**Objective:**  You have already seen some elements of Data Analysis in the course. In this lab, we will take our first step into working with the main element of Data Analysis, which is the dataset. By the end of this lab, you will be able to import data into a Jupyter Notebook and perform some data manipulation.\n",
    "\n",
    "- The `notebook` of this `Lab` can be downloaded here: [Lab1_Introduction.ipynb](https://hassothea.github.io/Data_Analysis_AUPP/Labs/Lab1_Introduction.ipynb){target=\"_blank\"}.\n",
    "\n",
    "- Or you can work directly with `Google Colab` here: [Lab1_Introduction.ipynb](https://colab.research.google.com/drive/14L1fgW35_yZAW3BIsG-oGLxBO0lXANMO?usp=sharing){target=\"_blank\"}.\n",
    "\n",
    "\n",
    "-----\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Student's name: ...\n",
    "- Year: ...\n",
    "- Major: ...\n",
    "\n",
    "-----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **1. Data for Your Business**\n",
    "\n",
    "Imagine you want to start your own business, such as a coffee shop or a bookstore. What types of data do you think you need to gather to determine the potential success of your business? Here are some questions to help you think and answer this question:\n",
    "\n",
    "- What is your plan for the business?\n",
    "\n",
    "- What information might you need to collect? What is the size of the data?\n",
    "\n",
    "- Where do you think you can find this information?\n",
    "\n",
    "- What might go wrong with the collected data?\n",
    "\n",
    "- What step do we need to handle this problem?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Answer:`\n",
    "\n",
    "\n",
    "\n",
    "---------\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **2. Importing Some Data**\n",
    "\n",
    "\n",
    "There are many online data sources that you can explore, and one of the most popular is [`Kaggle`](https://www.kaggle.com/datasets/). In addition to datasets, `Kaggle` also hosts data competitions with prizes and offers courses to help you advance in data learning.\n",
    "\n",
    "\n",
    "Here, we start our journey by exploring a dataset that you probably have heard its name before: [`Titanic`](https://www.kaggle.com/datasets/mahmoudsaadmohamed/titanic-dataset). You can download it from `Kaggle` using the following codes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# %pip install kagglehub\n",
    "\n",
    "import kagglehub\n",
    "\n",
    "# Download latest version\n",
    "path = kagglehub.dataset_download(\"yasserh/titanic-dataset\")\n",
    "\n",
    "\n",
    "# Pandas module allows you to import the data\n",
    "import pandas as pd\n",
    "data = pd.read_csv(path+'/Titanic-Dataset.csv')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-------\n",
    "\n",
    "### **2.1. Overview of the data**\n",
    "\n",
    "\n",
    "Answer the following questions:\n",
    "\n",
    "**A.** How many rows and columns are there in this dataset?\n",
    "\n",
    "**B.** Explain the meaning of each column.\n",
    "\n",
    "**C.** Are there any missing values in this dataset? If so, how many rows contain at least one missing value? \n",
    "\n",
    "- What should you do with column `Cabin`?\n",
    "\n",
    "- How would you drop rows with at least one missing value?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's find out!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-------\n",
    "\n",
    "### **2.2. Single information**\n",
    "\n",
    "\n",
    "**D.** How many male and female passengers were on the ship?\n",
    "\n",
    "**E.** How many of them survived? How many didn't?\n",
    "\n",
    "**F.** How many passengers were younger than 3 years old? How many were older than 60 years old?\n",
    "\n",
    "**G.** How many passengers embarked from the three ports?\n",
    "\n",
    "- `C`: Cherbourg, France.\n",
    "- `Q`: Queentown, Ireland.\n",
    "- `S`: Southampton, England.\n",
    "\n",
    "**H.** How many passengers were in the 1st, 2nd and 3rd class?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To do"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-------------\n",
    "\n",
    "### **2.3. Multiple information**\n",
    "\n",
    "**I.** How many 1st class passengers survived? How about 2nd and 3rd class?\n",
    "\n",
    "**J.** How many female passengers survived? How many males did?\n",
    "\n",
    "**K.** How many people from each embarkation port survived?\n",
    "\n",
    "**L.** Was `Jack` on the ship? How about `Rose`? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **Further Reading**\n",
    "\n",
    "- `Pandas` python library: https://pandas.pydata.org/docs/getting_started/index.html#getting-started\n",
    "\n",
    "- `10 Minute to Pandas`: https://pandas.pydata.org/docs/user_guide/10min.html\n",
    "\n",
    "- `Some Pandas Lession`: https://www.kaggle.com/learn/pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}