Introduction to Data Analysis


INF-604: Data Analysis

Lecturer: Dr. Sothea HAS

About the course

  • Objective: Equip you with essential Data Analysis skills to uncover insights from data and make informed decisions.

  • Grading Criteria

Criteria Percentage
Attendance 10%
Participation & quiz 30%
Midterm Exam 30%
Final Project & Presentation / Practical labs 30%
  • Programming:

Where to visit

Just want to know you a bit 👇

Let’s see 🫣

Introduction to
Data Analysis (DA)

đź“‹ Outline

  • Motivation & Overview

  • Type of Data Analysis

  • Related Roles in Data Ecosystem

  • Key part: Data Analysis Process

  • The Importance of Data Analysis

  • Applications of Data Analysis

Motivation

Motivation

Old Faithful dataset (\(272\) rows, \(2\) columns)

Code
import pandas as pd                 # Import pandas package
import seaborn as sns               # Package for beautiful graphs
import matplotlib.pyplot as plt     # Graph management
sns.set(style="whitegrid")          # Set grid background
# path = "https://gist.githubusercontent.com/curran/4b59d1046d9e66f2787780ad51a1cd87/raw/9ec906b78a98cf300947a37b56cfe70d01183200/data.tsv"                       # The data can be found in this link
df0 = pd.read_csv(path0 + "/faithful.csv" )  # Import it into Python
df0.head(5)                        # Randomly select 4 points
eruptions waiting
0 3.600 79
1 1.800 54
2 3.333 74
3 2.283 62
4 4.533 85

Code
plt.figure(figsize=(5,3.2))                          # Define figure size
sns.scatterplot(df0, x="waiting", y="eruptions")    # Create scatterplot
plt.title("Old Faithful data from Yellowstone National Park, US", fontsize=10)    # Title
plt.suptitle("Eruptions vs waiting times", fontsize=13, y=1)                 # Subtitle
plt.show()

  • The longer the wait, the longer duration of the eruption.

Motivation

Marketing (\(200\) rows, \(4\) columns)

Code
import pyreadr
import pandas as pd
df = pyreadr.read_r(path + "/marketing.rda")
df = df['marketing']
df.head(5)
youtube facebook newspaper sales
0 276.12 45.36 83.04 26.52
1 53.40 47.16 54.12 12.48
2 20.64 55.08 83.16 11.16
3 181.80 49.56 70.20 22.20
4 216.96 12.96 70.08 15.48

Motivation

Marketing (\(200\) rows, \(4\) columns)

Code
df.head(5)
youtube facebook newspaper sales
0 276.12 45.36 83.04 26.52
1 53.40 47.16 54.12 12.48
2 20.64 55.08 83.16 11.16
3 181.80 49.56 70.20 22.20
4 216.96 12.96 70.08 15.48
Code
import plotly.express as px
fig = px.scatter(df, x="youtube", y="sales", size_max=40)
fig.update_layout(title="Sales as a function of Youtube Ads",
                  width=550, height=380)
Code
fig = px.scatter_3d(df, x="youtube", y="facebook", z="sales", size_max=40)
camera = dict(eye=dict(x=1, y=-1, z=1.2))
fig.update_layout(title="Sales as a function of Facebook and Youtube",
                  width=550, height=380,
                  scene_camera=camera)
Code
fig = px.scatter_3d(df, x="youtube", y="facebook", z="sales", 
                    size="newspaper", color="newspaper",
                    size_max=40)
camera = dict(eye=dict(x=1, y=-1, z=1.2))
fig.update_layout(title="Sales as a function of all ads",
                  width=550, height=380,
                  scene_camera=camera)
  • Objective: Leverage this data to boost sales.

Data Analysis Overview

  • Simply put, it’s a process of treating raw data and making sense of it.

  • It involves using:

    • Statistical methods: descriptive, test, regression analysis…
    • Analytics tools: Python, , Excel, Power BI…
    • Techniques to identify patterns and trends: Data modeling, machine learning algorithms…
  • It helps businesses understand past performance, customer behavior, market trends and guides decision-making for future actions and more.

  • Example: What’s the best ad strategy for boosting sales?

Types of Data Analysis

Types of Data Analysis

1. Descriptive Analysis

  • Summarizes and provides insights into the past events.
  • Answers to “What happened during that time?”
  • Example: average expenses on ads, average sales…

Types of Data Analysis

2. Diagnostic Analysis

  • Delves deeper into the reasons behind specific outcomes.
  • Answers to “Why did it happen?”.
  • Example: why did the sales decreased or increased?

Types of Data Analysis

3. Predictive Analysis

  • Forecasts future outcomes using historical data and trends.
  • Answers to “What’s likely to happen next?”.
  • Example: how will the next ads impact sales?

Types of Data Analysis

4. Prescriptive Analysis

  • Use estimated outcomes to guide appropriate actions.
  • Answers to “What’s the best course of action to take?”.
  • Example: what is the most effective ad strategy?

Summary


Data Analyst vs Data Scientist

  • Data analysis is central to the roles of both Data Analysts and Data Scientists.

Key parts of Data Analysis

Key parts of Data Analysis

1. Define the problem

  • Define what is the problem? Desired outcomes?
  • Define what metric may be used to measure its quality?
  • Ex: What factors contribute to tourists’ happiness when visiting Angkor Wat?

Key parts of Data Analysis

2. Data Collection

  • Identify necessary data and data sources.
  • What tools to use? How should the data be organized?
  • Ex: Where to collect the data? Platforms like TripAdvisor, Google Reviews, and social media, or Surveys?

Key parts of Data Analysis

3. Data Cleaning

  • Address quality issues to ensure accurate analysis.
  • Fix missing values, handle outliers, and standardize data…
  • Ex: How should we store the data and clean it while preserving its quality and information?

Key parts of Data Analysis

4. Data Analyzing

  • Manipulate data to identify trends, correlations, patterns…
  • Examine the data and seek for solutions to the problem.
  • Ex: “What statistical values should we compute? Which graphs should we use? Which models should we try?

Key parts of Data Analysis

5. Interpretation

  • Deduce key messages/findings from analysis results.
  • Recognize the limitations of the analysis…
  • Ex: What factors most influence tourists’ satisfaction? Answering our defined questions.

Key parts of Data Analysis

6. Visualization & presentation

  • Present the findings using suitable graphs, charts, maps, etc.
  • Convey the insights from the analysis effectively.
  • Ex: How should we present our findings? Which values or graphs will help our audience easily understand?

The Importance of Data Analysis

The Importance of Data Analysis

  • Analyzes past performance, and uncovers trends, patterns, insights within data.
  • Unables companies to make informed decisions.
  • Leads to better strategies, optimized operations, and enhanced customer satisfaction.
  • Saving time and resources.
  • Drives innovation and fosters growth by identifying new opportunities and areas for improvement.




  • This is the introduction of my PhD thesis:

Applications

Applications of Data Analysis

  • Customer Insights: Understand behavior & preferences.
  • Market Research: Spot trends and opportunities.
  • Healthcare: Improve patient outcomes using clinical data.
  • Supply Chain Management: Optimize inventory and logistics.
  • Fraud Detection: Identify fraudulent activities.
  • Marketing Campaigns: Personalize ads strategies.
  • Human Resources: Analyze employee performance.
  • Sales Forecasting: Predict future trends.

🥳 Yeahhhh….









Party time… 🥂