Objective: The goal of this practical lab is to equip you all with the skills to effectively visualize data using various tools and techniques. You will learn how to interpret and represent data visually to communicate insights, trends, and patterns clearly and compellingly.
The Titanic dataset contains information on the passengers aboard the RMS Titanic, which sank in \(1912\). It includes details like age, gender, class, and survival status.
I bet you have heard about or watched Tiannic movie at least once. How about we take a look at the real dataset of Titanic available in Kaggle. For more information about the dataset and the columns, read Titanic dataset. Let’s import it into our Jupyter Notebook by running the following code.
import kagglehub# Download latest versionpath = kagglehub.dataset_download("surendhan/titanic-dataset")# print("Path to dataset files:", path)# Import dataimport pandas as pddata = pd.read_csv(path +"/titanic.csv")data.head()
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
892
0
3
Kelly, Mr. James
male
34.5
0
0
330911
7.8292
NaN
Q
1
893
1
3
Wilkes, Mrs. James (Ellen Needs)
female
47.0
1
0
363272
7.0000
NaN
S
2
894
0
2
Myles, Mr. Thomas Francis
male
62.0
0
0
240276
9.6875
NaN
Q
3
895
0
3
Wirz, Mr. Albert
male
27.0
0
0
315154
8.6625
NaN
S
4
896
1
3
Hirvonen, Mrs. Alexander (Helga E Lindqvist)
female
22.0
1
1
3101298
12.2875
NaN
S
A. What’s the dimension of this dataset? How many quantitative and qualitative variables are there in this dataset?
# To do
B. Are there any missing values? If so, you should analyze and handle them properly.
# To do
C. Visualize the distribution of each interesting column. At this stage of the analysis, you should try to see the over information about each column of the dataset:
What’s the majority class of the passengers?
How old were they during the incidence?
Where did most of them embark?
# To do
2. Bivariate/Multivariate Analysis
We are primarily interested in exploring the relationship between each column and the likelihood of passenger survival. The following questions will guide you through this exploration. In each question, try to give some comments on what you observe in the graphs.
A. Survival Analysis: How did the survival rates vary by gender and class?
Hint: Create bar charts or stacked bar charts showing the survival rates for different passenger classes and genders.
# To do
B. Fare and Survival: Is there a relationship between the fare paid and the likelihood of survival?
Hint: Create boxplots to analyze the fare distribution among survivors and non-survivors.
# To do
C. Family Size: How does family size (number of siblings/spouses and parents/children) impact the chances of survival?
# To do
D. Embarkation Points: How do survival rates differ based on the port of embarkation (C, Q, S)?
# To do
E. Pclass and Age: How does passenger class correlate with age?
# To do
F. Gender and Age: How does age distribution differ between male and female passengers?
# To do
G. Age, Fare, Gender and Survival: View the connection of Age, Fare, Gender and Survival in one graph.
# To do
H. Age, Fare, Class and Survival: View the connection of Age, Fare, Class and Survival in one graph.
# To do
I. Based on your analysis, which variables appear to have the greatest impact on the likelihood of survival?