import pandas as pd # Import pandas packageimport seaborn as sns # Package for beautiful graphsimport matplotlib.pyplot as plt # Graph managementdata = pd.read_csv(path_titanic +"/Titanic-Dataset.csv" ) # Import it into Pythonsns.set(style="whitegrid") # Set grid backgrounddata.drop(columns=['PassengerId']).head()
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
First step in analyzing data is understanding the nature of each individual column.
What graph should we use to present qualitative data?
Countplot/Barplot: Represent each count/proportion by a bar.
Example:
import matplotlib.pyplot as pltimport seaborn as sns # For graphsns.set(style="whitegrid") # set nice backgroundplt.figure(figsize=(5,3))ax = sns.countplot(data, x="Survived") # create graphax.set_title("Barplot of Survived") # add titleax.bar_label(ax.containers[0]) # add number to barsplt.show() # Show graph
What graph should we use to present qualitative data?
Countplot/Barplot: Represent each count/proportion by a bar.
Example:
import matplotlib.pyplot as pltimport seaborn as sns # For graphsns.set(style="whitegrid") # set nice backgroundplt.figure(figsize=(5,3))ax = sns.countplot(data,x="Survived", stat="proportion")ax.set_title("Barplot of Survived") # add titleax.bar_label(ax.containers[0], fmt="%0.2f") # numberplt.show() # Show graph