Introduction and Descriptive Statistics


Exploratory Data Analysis & Unsupervised Learning

     

Lecturer: Dr. HAS Sothea
——————— Dr. PHAUK Sokkhey

About the course

🎯 Objective:

Equip you with

  • Essential EDA skills to handle, preprocess and uncover insights from data.
  • Key Unsupervised Learning techniques to apply them to solve real-world problems with proper interpretation.

📝 Grading Criteria

Criteria Attendance Labs & Quiz Midterm Exam Final Exam Final Project
Percentage 10% 20% 20% 25% 25%

💻 Programming:

Where to visit during the course

📋 Outline


  • Introduction to Exploratory Data Analysis

  • Univariate Analysis

  • Review of Descriptive Statistics

Motivation & Introduction

Motivation & Introduction

Motivation

  • In 60s, someone said “Garbage in, garbage out (GIGO)!

  • This phrase expresses the idea that in any computation or fields of study, low-quality input (data) will produce faulty output or result!

  • Example: when working with data, we may face problems such as

    • The temperature in 60s isn’t so useful in predicting tomorrow temperature.
    • Gender may be encoded as ‘Male’ and ‘Female’ mixing with ‘M’ and ‘F’.
    • A female patient info was duplicated many times in the data.
    • Salaries are often missing in optional survey…

Motivation & Introduction

Introduction

  • Exploratory Data Analysis (EDA) is all about getting curious about your data:
    • Finding out what are the problems.
    • What patterns you can find.
    • What relationships exist.
  • It’s the 1st step towards analysis and model building.
  • If done right, it can help you to
    • Clean the data without damaging the patterns.
    • Formulate further questions and areas for investigation.
    • Uncover aspects of your data that you wouldn’t have seen otherwise.

  • Select informative set of inputs for predictive tasks…

Motivation & Introduction

Introduction

Goals of EDA

  • Depending what you want to do with the data, EDA can take many forms.
  • It generally involves:
    • Inspect the data.
    • Detect issues.
    • Uncover patterns.
    • Find new aspect for investigation.
    • Prepare for model building:
      • Check assumptions.
      • Select features.
      • Choose an appropriate method.

Motivation & Introduction

Introduction

Different Goals of EDA, different methods

  • Just like the goal may vary, so do the techniques.
  • The EDA process generally involves:
    • Data Preprocessing and Inspection
    • Data Visualization
    • Descriptive and Inferential Statistics

Motivation & Introduction

Introduction

Different Goals of EDA, different methods

  • Data inspection is an important first step of any analysis.
  • Technically, The primary motive of EDA is to
    • Handling missing values of the dataset (a most common issue with every dataset)
    • Handling the outliers
    • Removing duplicate data
    • Examine the data distribution
    • Encoding the categorical variables
    • Normalizing and Scaling

Motivation & Introduction

Introduction

Descriptive & Inferential Statistics

  • Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data.
  • Descriptive statistics focuses on summarizing the data using statistical values and graphs and is needed in EDA for
    • Understanding the data distribution
    • Detecting abnormality
    • Discover pattern/trends…
  • Inferential Statistics infers information analyzed using data to the larger population by using different hypothesis testing.

1. Univariate Analysis

1.0. Motivation

Consider Titanic dataset.

Code
import pandas as pd                 # Import pandas package
import seaborn as sns               # Package for beautiful graphs
import matplotlib.pyplot as plt     # Graph management
data = pd.read_csv(path_titanic + "/Titanic-Dataset.csv" ) # Import it into Python
sns.set(style="whitegrid")          # Set grid background
data.drop(columns=['PassengerId']).head()
Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
  • First step in analyzing data is understanding the nature of each individual column.
  • Some guiding questions:
    • Were there more female or male passengers?
    • Did many people survive?
    • Were there more elderly or young passengers?

1.1. Data Types

Quality vs Quantity

Code
data[['Survived', 'Pclass', 'Age', 'Embarked']].head(5)    # Show 5 first rows
Survived Pclass Age Embarked
0 0 3 22.0 S
1 1 1 38.0 C
2 1 3 26.0 S
3 1 1 35.0 S
4 0 3 35.0 S
  • Column Embarked is clearly different:
    • Performing \(+\), \(-\), \(\times\), \(\div\)… doesn’t make any sense!
    • Comparing \(<\), \(>\)… doesn’t make sense either!
  • Embarked is a Qualitative or Categorical data.
  • Age on the other hand is numbered:
    • Age \(50\) is older than \(30\).
    • Age \(20\) is \(5\) years younger than \(25\) or \(25-20=5\).
  • Age is a Quantitative or Numerical data.
  • Q1: How about other two columns?

1.1. Data Types

Quality vs Quantity

1.1. Data Types

Challenge

Code
data[['Sex', 'SibSp', 'Parch', 'Fare']].head()
Sex SibSp Parch Fare
0 male 1 0 7.2500
1 female 1 0 71.2833
2 female 0 0 7.9250
3 female 1 0 53.1000
4 male 0 0 8.0500


  • Q2: Define type of these columns.
Quantitative Qualitative
Column Dis Cont Nomi Ordi
Sex
SibSp
Parch
Fare
Quantitative Qualitative
Column Dis Cont Nomi Ordi
Sex
SibSp
Parch
Fare
Quantitative Qualitative
Column Dis Cont Nomi Ordi
Sex
SibSp
Parch
Fare
Quantitative Qualitative
Column Dis Cont Nomi Ordi
Sex
SibSp
Parch
Fare
Quantitative Qualitative
Column Dis Cont Nomi Ordi
Sex
SibSp
Parch
Fare


  • Now, let’s take a closer look!

1.1.1. Qualitative Data

Statistical values

data[['Pclass', 'Survived', 'Embarked', 'Sex']].head()
Pclass Survived Embarked Sex
0 3 0 S male
1 1 1 C female
2 3 1 S female
3 1 1 S female
4 3 0 S male
  • What values should we use to describe qualitative data?

  • Absolute Frequency: Number of occurrences/counts of each category.

  • Relative Frequency: proportion/percentage of each category.

  • Mode: Category with highest frequency/count.

  • Example:
Code
freq_tab = data[['Pclass']].value_counts().to_frame().round()
freq_tab['proportion'] = data[['Pclass']].value_counts(normalize=True).round(2)
freq_tab.T
Pclass 3 1 2
count 491.00 216.00 184.00
proportion 0.55 0.24 0.21

Code
freq_tab = data[['Sex']].value_counts().to_frame()
freq_tab['proportion'] = data[['Sex']].value_counts(normalize=True).round(2)
freq_tab.T
Sex male female
count 577.00 314.00
proportion 0.65 0.35
  • Q3: I dare you to take care of the other two columns 😏!

1.1.1. Qualitative Data

Visualization

data[['Pclass', 'Survived', 'Embarked', 'Sex']].head()
Pclass Survived Embarked Sex
0 3 0 S male
1 1 1 C female
2 3 1 S female
3 1 1 S female
4 3 0 S male
  • What graph should we use to present qualitative data?
  • Countplot/Barplot: Represent each count/proportion by a bar.
  • Example:
import matplotlib.pyplot as plt
import seaborn as sns  # For graph
sns.set(style="whitegrid") # set nice background
plt.figure(figsize=(5,3))
ax = sns.countplot(data, x="Survived") # create graph
ax.set_title("Barplot of Survived") # add title
ax.bar_label(ax.containers[0]) # add number to bars
plt.show() # Show graph

1.1.1. Qualitative Data

Visualization

data[['Pclass', 'Survived', 'Embarked', 'Sex']].head()
Pclass Survived Embarked Sex
0 3 0 S male
1 1 1 C female
2 3 1 S female
3 1 1 S female
4 3 0 S male
  • What graph should we use to present qualitative data?
  • Countplot/Barplot: Represent each count/proportion by a bar.
  • Example:
import matplotlib.pyplot as plt
import seaborn as sns  # For graph
sns.set(style="whitegrid") # set nice background
plt.figure(figsize=(5,3))
ax = sns.countplot(data,x="Survived", stat="proportion")
ax.set_title("Barplot of Survived") # add title
ax.bar_label(ax.containers[0], fmt="%0.2f") # number
plt.show() # Show graph

1.1.1. Qualitative Data

Visualization

data[['Pclass', 'Survived', 'Embarked', 'Sex']].head()
Pclass Survived Embarked Sex
0 3 0 S male
1 1 1 C female
2 3 1 S female
3 1 1 S female
4 3 0 S male
  • What graph should we use to present qualitative data?
  • Pie chart: Represent count/proportion by circular slices.
  • Example:
import matplotlib.pyplot as plt
import seaborn as sns  # For graph
sns.set(style="whitegrid") # set nice background
plt.figure(figsize=(6,4))
tab = data['Embarked'].value_counts() # Compute 
plt.pie(tab, labels=tab.index, autopct='%0.2f%%') # graph
plt.title("Barplot of Pclass") # add title
plt.show() # Show graph

1.1.1. Qualitative Data

Visualization

data[['Pclass', 'Survived', 'Embarked', 'Sex']].head()
Pclass Survived Embarked Sex
0 3 0 S male
1 1 1 C female
2 3 1 S female
3 1 1 S female
4 3 0 S male
  • What graph should we use to present qualitative data?
  • Pie chart: Represent count/proportion by circular slices.

⚠️ Pie charts can be challenging to read with numerous categories. They’re harder to perceive when many categories have similar proportions.

  • Example:
import matplotlib.pyplot as plt
import seaborn as sns  # For graph
sns.set(style="whitegrid") # set nice background
plt.figure(figsize=(6,4))
tab = data['Embarked'].value_counts() # Compute 
plt.pie(tab, labels=tab.index, autopct='%0.2f%%') # graph
plt.title("Barplot of Pclass") # add title
plt.show() # Show graph

1.1.2. Qualitative Data

Summary

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head()
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0
  • What values should we use to describe quantitative data?

  • Quantiles: For data sorted in ascending order, the cut points divide the range into contiguous proportion intervals.

Different types of quantile:

  • Quartiles: The 25th (Q1), 50th (Q2 or median), and 75th (Q3) percentiles.

min 25% 50% 75% max
Fare 0.00 7.91 14.45 31.0 512.33
Age 0.42 20.12 28.00 38.0 80.00
  • Percentiles: Values that divide data into 100 equal parts.

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head()
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0
  • What values should we use to describe quantitative data?

  • Quantiles: For data sorted in ascending order, the cut points divide the range into contiguous proportion intervals.

Method to find Quartiles:

  • Sort the data in ascending order: \(X_1,...,X_n\).

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head()
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0
  • What values should we use to describe quantitative data?

  • Quantiles: For data sorted in ascending order, the cut points divide the range into contiguous proportion intervals.

Method to find Quartiles:

  • Sort the data in ascending order: \(X_1,...,X_n\).

  • If \(n\) is odd: \(\color{red}{Q_2}=X_{(n+1)/2}\)middle term”.
  • If \(n\) is even: \(\color{red}{Q_2}=\frac{X_{(n/2)}+X_{(n/2)+1}}{2}\)middle value”.

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head()
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0
  • What values should we use to describe quantitative data?

  • Quantiles: For data sorted in ascending order, the cut points divide the range into contiguous proportion intervals.

Method to find Quartiles:

  • Sort the data in ascending order: \(X_1,...,X_n\).

  • If \(n\) is odd: \(\color{red}{Q_2}=X_{(n+1)/2}\)middle term”.
  • If \(n\) is even: \(\color{red}{Q_2}=\frac{X_{(n/2)}+X_{(n/2)+1}}{2}\)middle value”.
  • \(Q_1\): the middle point of the lower-half data.
  • \(\color{green}{Q_3}\): the middle point of the upper-half data.

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head(3)
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0

Median (Q2) is a value that describe Measure of Central Tendency.

  • Mean: Average value of all data points:

\[\color{blue}{\overline{X}=\frac{1}{n}\sum_{i=1}^nX_i=\frac{X_1+\dots+X_n}{n}}.\]

Examples:

mean = data[['Age','Fare']].mean()\
                        .to_frame()
mean.columns = ['Mean']
mean.T
Age Fare
Mean 29.699118 32.204208
  • The average age of passengers was around \(30\) years old.

  • In average, passengers spent approximately \(£32\) in fare.

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head(3)
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0

Two main Measure of dispersion:

  • Sample variance: average of squared distances of data points from the Mean.

\[\color{blue}{\widehat{\sigma}^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X})^2}.\]

Examples:

var = data[['Age','Fare']].var()\
                        .to_frame()\
                        .round(3)
var.columns = ['Var']
var.T
Age Fare
Var 211.019 2469.437
  • Large variance means that data points are widely spread out from the Mean.

1.1.2. Quantitative Data

Statistical values

data[['Age', 'Fare', 'SibSp', 'Parch']].head(3)
Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0

Two main Measure of dispersion:

  • Sample standard deviation: Just the square root of Variance.

\[\color{blue}{\widehat{\sigma}=\sqrt{\widehat{\sigma}^2}=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X})^2}}.\]

Examples:

std = data[['Age','Fare']]\
        .apply(['var', 'std'])
std
Age Fare
var 211.019125 2469.436846
std 14.526497 49.693429
  • Large standard deviation (Std) means data points are spread out widely from the Mean.
  • Std has the same unit as \(X_i\).

1.1.2. Quantitative Data

Statistical Summary

Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0

Statistical summary uses all key values to help us understand the data:

  • Where the data is concentrated (mean/median).
  • How spread out it is (var/std)…

Examples:

data[['Age','Fare']]\
        .describe().drop('count')  # for summary
Age Fare
mean 29.699118 32.204208
std 14.526497 49.693429
min 0.420000 0.000000
25% 20.125000 7.910400
50% 28.000000 14.454200
75% 38.000000 31.000000
max 80.000000 512.329200

1.1.2. Quantitative Data

Visualization: Boxplot

Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0

  • Boxplots describe data using Quartiles and the range where data normally fall within.
  • Lower and upper part of the box are \(Q_1\) and \(\color{green}{Q_3}\). Median \(\color{red}{Q_2}\) is the middle line.
  • Interquartile range: \(\text{IQR}=\color{green}{Q_3}-Q_1\), it’s the gap that covers central range of \(50\%\) of data.
  • Range: \([Q_1-1.5\text{IQR},\color{green}{Q_3}+1.5\text{IQR}]\). If the data are normally distributed.
  • Data points that fall outside this range, can be considered Outliers (data that deviate away from usual observations).

1.1.2. Quantitative Data

Visualization: Boxplot

Age Fare SibSp Parch
0 22.0 7.2500 1 0
1 38.0 71.2833 1 0
2 26.0 7.9250 0 0
3 35.0 53.1000 1 0
4 35.0 8.0500 0 0
Code
import plotly.express as px
fig = px.box(data, x="Fare")
fig.update_layout(height=220, 
                  width=530,
                  title="Boxplot of Fare")
fig.show()
  • Boxplots describe data using Quartiles and the range where data normally fall within.
  • Lower and upper part of the box are \(Q_1\) and \(\color{green}{Q_3}\). Median \(\color{red}{Q_2}\) is the middle line.
  • Interquartile range: \(\text{IQR}=\color{green}{Q_3}-Q_1\), it’s the gap that covers central range of \(50\%\) of data.
  • Range: \([Q_1-1.5\text{IQR},\color{green}{Q_3}+1.5\text{IQR}]\). If the data are normally distributed.
  • Data points that fall outside this range, can be considered Outliers (data that deviate away from usual observations).

1.1.2. Quantitative Data

Visualization: Boxplot

Code
import plotly.express as px
fig = px.box(data, x="Fare")
fig.update_layout(height=220,
                  width=530,
                  title="Boxplot of Fare")
fig.show()
  • This boxplot tells us that:
    • Fares range from \(£0\) to maximum fare of \(£512.33\).
    • \(Q_1=£7.9\) indicating that around \(25\%\) of passengers spent less than \(£7.9\) to get to the ship.
    • \(\color{red}{Q_2}=£14.45\) (Median): \(\approx 50\%\) spent less than \(£14.45\).
    • \(\color{green}{Q_3}=£31\): \(\approx 75\%\) spent less than \(£31\).
    • There are many outliers, passengers who spent more than the upper fence (\(£65\)), with the largest fare of \(£512.33\).
  • Boxplots describe data using Quartiles and the range where data normally fall within.
  • Lower and upper part of the box are \(Q_1\) and \(\color{green}{Q_3}\). Median \(\color{red}{Q_2}\) is the middle line.
  • Interquartile range: \(\text{IQR}=\color{green}{Q_3}-Q_1\), it’s the gap that covers central range of \(50\%\) of data.
  • Range: \([Q_1-1.5\text{IQR},\color{green}{Q_3}+1.5\text{IQR}]\). If the data are normally distributed.
  • Data points that fall outside this range, can be considered Outliers (data that deviate away from usual observations).

1.1.2. Quantitative Data

Visualization: Histogram

Code
import plotly.express as px
fig = px.histogram(data, x="Age")
fig.update_layout(height=220, 
                  width=530, 
                  title="Histogram of Age")
fig.show()
  • A histogram is constructed by:
    • Defining a grid range of bins: \(B_1, \dots, B_N\).
    • The height of each bar represents the count of \(X_i\) values that fall within the corresponding bin.
  • It describes the frequency of observations within each bin range.

Mathematical definition of histogram

  • Define bins: \(B_1,\dots, B_N\).
  • For any \(x\) and \(x\in B_k\) for some \(k\) then

\[\text{hist}(x)=\sum_{i=1}^n\mathbb{1}_{\{X_i\in B_k\}}.\]


For this example of Age:

  • Most passengers were between 16 and 52 years old.
  • There were more children younger than 10 years old than those around 10 years old.
  • There were fewer than 10 individuals in each age group older than 52 years old.

1.1.2. Quantitative Data

Visualization: Kernel Density Plot (KDE)

Code
import plotly.figure_factory as ff
age = [data[['Age']].dropna().values.reshape(-1)]
group_labels = ['distplot']
fig = ff.create_distplot(age, group_labels=group_labels, bin_size=1.9)
fig.update_layout(height=220, 
                  width=530, 
                  title="Histogram of Age")
fig.show()
  • A Kernel Density Plot is a smooth, continuous version of a histogram.
  • It describes the relative frequency of observations over ranges of values.
  • It has nicer mathematical properties than histograms.

Mathematical definition of KDE

  • If \(K\) is a smooth kernel function, for example: \(K(x)=e^{-x^2/2}\).
  • For a given \(h>0\) and for any \(x\):

\[\text{kde}(x)=\frac{1}{nh}\sum_{i=1}^nK\Big(\frac{x-X_i}{h}\Big).\]


  • Kernel density plot conveys similar information as histograms.
  • It’s often discussed in pobability and statistics classes.

1.1.2. Quantitative Data

Summary

1.2. Real examples

Qualitative columns of Titanic Dataset

Code
qual_var = ['Survived', 'Pclass', 'Sex']
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig = make_subplots(
    rows=3, cols=1, 
    specs=[[{"type": "bar"}], [{"type": "bar"}], [{"type": "bar"}]],
    subplot_titles=("Barplot of Survived", "Barplot of Pclass ", "Barplot Sex"))
for i, va in enumerate(qual_var):
    cnt = data[va].value_counts()
    if i == 0:
        fig.add_trace(
            go.Bar(x=list(cnt.index.astype(str)), y=list(cnt.values), name=va), col=1, row=i+1)
    else:
        fig.add_trace(
            go.Bar(x=list(cnt.index.astype(object)), y=list(cnt.values), name=va), col=1, row=i+1)
fig.update_layout(height=450, width=450)
fig.show()
Code
fig = make_subplots(
    rows=3, cols=1, 
    specs=[[{"type": "pie"}], [{"type": "pie"}], [{"type": "pie"}]],
    subplot_titles=("Pie chart of Survived", "Pie chart of Pclass ", "Pie chart Sex"))
for i, va in enumerate(qual_var):
    cnt = data[va].value_counts()
    fig.add_trace(go.Pie(labels=list(cnt.index.astype(object)), values=list(cnt.values)), col=1, row=i+1)
fig.update_layout(height=450, width=450)
fig.show()

1.2. Real examples

Quantitative columns of Titanic Dataset

Code
quan_var = ['Age', 'SibSp', 'Parch', 'Fare']
cols = ['#C96451', '#80C96F', '#6B7FDB', '#C07EDE']
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=("Distribution of Age", "Distribution of Parch", "Distribution of SibSp", "Distribution of Fare", "","","",""))
for i, va in enumerate(quan_var):
    fig.add_trace(
        go.Box(x=data[va].values, name=va, marker_color = cols[i]), col=i+1, row=1)
    fig.add_trace(
        go.Histogram(x=data[va].values, name=va, marker_color = cols[i]), col=i+1, row=2)
fig.update_layout(height=450, width=1000)
fig.show()

1.2. Real examples

Quantitative columns of Titanic Dataset

Code
quan_var = ['Age', 'SibSp', 'Parch', 'Fare']
cols = ['#C96451', '#80C96F', '#6B7FDB', '#C07EDE']
fig = make_subplots(
    rows=2, cols=4,
    subplot_titles=("Distribution of Age", "Distribution of Parch", "Distribution of SibSp", "Distribution of log(Fare)", "","","",""))
for i, va in enumerate(quan_var):
    fig.add_trace(
        go.Box(x=data[va].values, name=va, marker_color = cols[i]), col=i+1, row=1)
    fig.add_trace(
        go.Histogram(x=data[va].values, name=va, marker_color = cols[i]), col=i+1, row=2)
fig.update_layout(height=450, width=1000)
fig.update_xaxes(type="log", row=1, col=4)
fig.update_yaxes(type="log", row=2, col=4)
fig.show()

1.3. More on Descriptive Statistics






🥳 Yeahhhh….









Party time… 🥂