Objective: This practical lab aims to enhance your skills in implementing simple and multiple linear regression using market data covered in the course.
You need internet to load the data by running the following codes. We will simply call it data.
import pyreadrimport pandas as pddata = pd.read_csv("https://raw.githubusercontent.com/hassothea/Data_Analytics_AUPP/refs/heads/main/data/marketing.csv", sep=",")data.head(5)
youtube
facebook
newspaper
sales
0
276.12
45.36
83.04
26.52
1
53.40
47.16
54.12
12.48
2
20.64
55.08
83.16
11.16
3
181.80
49.56
70.20
22.20
4
216.96
12.96
70.08
15.48
1. Study correlation matrix
A. Compute correlation matrix of this data using pd.corr() function. Explain this correlation matrix (see slide 21).
# To do
B. Plot scatterplot of the following pairs: - Facebook (x-axis) vs Sales (y-axis) - Newspaper (x-axis) vs Sales (y-axis)
You should add title and using proper name for each axis.
import matplotlib.pyplot as pltimport seaborn as snsimport plotly.express as px# To do
Key remark: Correlation matrix tells us a lot about which inputs are useful for constructing the model. If we were to build a model using only one input, use the one having the highest correlation with the target. On the other hand, putting many highly correlated inputs together can result in a bad model because it can lead to multicollinearity. This means the model has difficulty distinguishing the individual effects of each input variable, resulting in unstable and unreliable coefficient estimates. Additionally, it can inflate the variance of the regression coefficients, making the model less interpretable and potentially overfitting the data. Simply put, it muddies the waters.
2. Simple Linear Regression
A. We already used YouTube as an explanatory variable to predict Sales in the course.
Now, build a SLR model to predict sales using Facebook.
from sklearn.linear_model import LinearRegression# Prepare data X and y# To do# Build model# To do# Fit the model on the data# To do
Perform model dignosis:
Compute \(R^2\) then explain the observed value.
Compute and plot residuals for this model. Conclude.
# Compute R-squared# To do# Graph# To do
B. Repeat question (A) but using newspaper as an input for SLR instead.
# Prepare data X and y# To do# Fit model# To do
3. Multiple Linear Regression
We already build a MLR with two inputs during the course. Now, you will do it using all three inputs.
A. Build a MLR model using the three inputs.
# Prepare data# To do# Build model# To do
B. Perform model diagnosis as illustrated in the course (from slide 26). Interpret your findings and conclude.