Introduction to Timeseries Analysis

INF-604: Data Analysis

Lecturer: Dr. Sothea HAS

Outline

Introduction & Motivation
Visualization & Statistical Values
Timeseries Main Components & Decompositions
Real-world Examples

Introduction & Motivation

Non- vs timeseries data (Gapminder)

Non-timeseries data (2002)

Code

import numpy as np
import pandas as pd
from gapminder import gapminder
gapminder.query("year == 2002").drop(columns=["year", "continent", "GDP_Category"]).head(3)

	country	lifeExp	pop	gdpPercap
10	Afghanistan	42.129	25268405	726.734055
22	Albania	75.651	3508512	4604.211737
34	Algeria	70.994	31287142	5288.040382

We are interested in:
- Distribution of individual columns (barplot, boxplot, histogram…)
- Relationship between columns (scatterplot, grouped barplots, color, shape, size…)
- Statistical values: means, min, max…
Not interested in (trend or evolution) in time.

Timeseries data (Cambodia)

Code

gapminder.query("country == 'Cambodia'").drop(columns=["continent", "country", "GDP_Category"]).head(3)

	year	lifeExp	pop	gdpPercap
216	1952	39.417	4693836	368.469286
217	1957	41.366	5322536	434.038336
218	1962	43.415	6083619	496.913648

Individual column and the relationship between columns are still important.
Main interest: how those columns and their relationship evolve over time?
Previous graphs and statistical values can still be used, but should be interpreted differently!
More tools (graphs, values…) are required to understand their tendency as time evolves.

Introduction & Motivation

Non- vs timeseries data (Gapminder)

Non-timeseries data (2002)

Code

gapminder.query("year == 2002").drop(columns=["year", "continent", "GDP_Category"]).head(3)

	country	lifeExp	pop	gdpPercap
10	Afghanistan	42.129	25268405	726.734055
22	Albania	75.651	3508512	4604.211737
34	Algeria	70.994	31287142	5288.040382

How would you interpret this histogram?

Timeseries data (Cambodia)

Code

gapminder.query("country == 'Cambodia'").drop(columns=["continent", "country", "GDP_Category"]).head(3)

	year	lifeExp	pop	gdpPercap
216	1952	39.417	4693836	368.469286
217	1957	41.366	5322536	434.038336
218	1962	43.415	6083619	496.913648

How about this?

Introduction & Motivation

Definition

A time series is a sequence of data points organized in time order.
Usually, the time signal is sampled at equally spaced points in time.

Examples:
- Climate: temperature, humidity…
- Finance: stock prices, asset prices, exchange rate…
- E-Commerce: page views, new users, searches…
- Business: transactions, revenue, inventory levels…
- Natural language: texts, sentences…

Motivation

Understanding the nature and behavior of the timeseries.
Forecasting the future based on the historical data.

Visualization & Statistical Values

Visualization & Statistical values

Visualization

Quantitative: lineplot.

Code

import matplotlib.pyplot as plt
sns.set(style="white")
_, axs = plt.subplots(3, 1, figsize=(5,4.5))
sns.lineplot(df_climate.iloc[::12,:], x="date", y="meantemp", ax=axs[0])
axs[0].set_title("Mean temperature", fontsize=13)
axs[0].set_xticks([])
axs[0].set_ylabel("")
axs[0].set_xlabel("")

sns.lineplot(df_climate.iloc[::12,:], x="date", y="humidity", ax=axs[1])
axs[1].set_title("Humidity", fontsize=13)
axs[1].set_xticks([])
axs[1].set_ylabel("")
axs[1].set_xlabel("")

sns.lineplot(df_climate.iloc[::12,:], x="date", y="wind_speed", ax=axs[2])
axs[2].set_title("Wind speed", fontsize=13)
# axs[2].tick_params(axis='x', labelrotation=90, size=8)
plt.xticks(df_climate.iloc[::12,:].date[::10], rotation=45, size=8)
plt.tight_layout()
plt.show()

Qualitative: evolutional barplot.

Code

import plotly.express as px
def cat_gdp(yearly_data):
    return pd.qcut(yearly_data, q=3, labels=['Developing', 'Emerging', 'Developed'])
df = gapminder
# Apply the function to each year
df['GDP_Category'] = df.groupby('year').apply(lambda x: cat_gdp(x.gdpPercap)).reset_index(level=0, drop=True)

df_Af = df.query("continent == 'Asia'")
# Aggregate the data
df_agg = df_Af.groupby(['year', 'GDP_Category']).size().reset_index(name='Count')

# Create the stacked bar chart
fig = px.bar(
    df_agg, x='year', y='Count', 
    color='GDP_Category', barmode='stack',
    title="Evolution of Asian Countries' GDP from 1952 to 2007",
    labels={'Count': 'Number of Countries', 'year': 'Year'})

fig.update_layout(height=410, width=500)
fig.show()

Visualization & Statistical values

Statistical values: Autocorrelation

We’re interested in how the current value influences the succeeding/later values?
Consider the mean temperature & its lags:

	Temp	Lag1	Lag2	Lag3
0	10.000	15.833	12.250	16.667
1	15.833	12.250	16.667	15.600
2	12.250	16.667	15.600	19.000
3	16.667	15.600	19.000	22.333
4	15.600	19.000	22.333	24.143

Q1: If Temp and Lag1 are highly correlated, what does that mean?
A1: Current highly correlated with next.

Correlations of our example:

	Temp	Lag1	Lag2	Lag3
Temp	1.0	0.89	0.82	0.71

Autocorrelation at lag \(\color{blue}{k}\) of \((X_t)\): \[r_{\color{blue}{k}}=\frac{n}{n-\color{blue}{k}}\frac{\sum_{t=1}^{n-\color{blue}{k}}(X_t-\overline{X})(X_{t+\color{blue}{k}}-\overline{X})}{\sum_{t=1}^{n}(X_t-\overline{X})^2},\] where \(\overline{X}=\frac{1}{n}\sum_{t=1}^nX_t\) (average).
Interpretation: for any lag \(k:-1\leq r_k\leq 1\) and it indicates the correlation between original timeseies with its \(k\)-lag timeseries.

Visualization & Statistical values

Visualization: Correlogram/ACF plot

It shows the relation between the lag \(k\) and the \(r_k\).
In python :

Code

from statsmodels.graphics.tsaplots import plot_acf
_, ax = plt.subplots(2,1,figsize=(5, 3.65))
plot_acf(df_lag.Temp, lags=60, ax=ax[0])
ax[0].set_title('Correlogram for Mean Temperature', fontsize=13)
ax[0].set_xlabel('Lag')
ax[0].set_ylabel('Autocorrelation')

sns.lineplot(df_climate.iloc[::12,:].iloc[:60,:], x="date", y="meantemp", ax=ax[1])
ax[1].set_title("Mean temperature", fontsize=13)
ax[1].set_ylabel("")
ax[1].set_xlabel("")
plt.xticks(df_climate.iloc[::12,:].iloc[:60,:].date[::10], rotation=45, size=8)
plt.tight_layout()
plt.show()

Interpretation:
- The autocorrelation oscillates between 1 and -1, showing a periodic pattern of the temperature.
- Peaks and troughs in autocorrelation repeat approximately every 30 lags, indicating cycles in the data.
- Values outside the shaded region indicate significant autocorrelation, which points to a strong relationship between temperature at specific lags.

Visualization & Statistical values

Visualization: Correlogram/ACF plot

Consider more examples:

Their correlograms:

Visualization & Statistical values

Visualization: Correlogram/ACF plot

More example: Meta stock

	Date	Open	High	Low	Close
0	2012-05-18	42.05	45.00	38.00	38.23
1	2012-05-21	36.53	36.66	33.00	34.03
2	2012-05-22	32.61	33.59	30.94	31.00
3	2012-05-23	31.37	32.50	31.36	32.00
4	2012-05-24	32.95	33.21	31.77	33.03
5	2012-05-25	32.90	32.95	31.11	31.91
6	2012-05-29	31.48	31.69	28.65	28.84
7	2012-05-30	28.70	29.55	27.86	28.19
8	2012-05-31	28.55	29.67	26.83	29.60
9	2012-06-01	28.89	29.15	27.39	27.72
10	2012-06-04	27.20	27.65	26.44	26.90
11	2012-06-05	26.70	27.76	25.75	25.87

Timeseries Main Components & Decompositions

Main components & Decompositions

Three main components:
- \(T_t\): Trend-cycle component
- \(S_t\): Seasonal component
- \(R_t\): Remainder
Two mains decompositions:
- Additive decomposition: \[X_t=T_t+S_t+R_t,\] with \(R_t\sim{\cal N}(0,\sigma^2),\sigma>0\).
- Multiplicative decomposition: \[X_t=T_t\times S_t\times R_t,\] with \(R_t\sim{\cal N}(1,\sigma^2),\sigma>0\).

(\(+\)) Decomposition of Meta stock:

Code

from statsmodels.tsa.seasonal import seasonal_decompose
ts = df_fb[['Close']].values[::10]
decomposition = seasonal_decompose(
    ts, model='additive', 
    period=12)
seasonal, trend, residual = decomposition.seasonal, decomposition.trend, decomposition.resid

plt.figure(figsize=(5, 5))
plt.subplot(411)
plt.plot(ts, 'r', label='Original')
plt.legend()
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend()
plt.subplot(413)
plt.plot(seasonal, label='Seasonal')
plt.legend()
plt.subplot(414)
plt.plot(residual, label='Residual')
plt.legend()
plt.tight_layout()
plt.show()

Main components & Decompositions

Three main components:
- \(T_t\): Trend-cycle component
- \(S_t\): Seasonal component
- \(R_t\): Remainder
Two mains decompositions:
- Additive decomposition: \[X_t=T_t+S_t+R_t,\] with \(R_t\sim{\cal N}(0,\sigma^2),\sigma>0\).
- Multiplicative decomposition: \[X_t=T_t\times S_t\times R_t,\] with \(R_t\sim{\cal N}(1,\sigma^2),\sigma>0\) ✅.

(\(\times\)) Decomposition of Meta stock:

Code

decomposition = seasonal_decompose(
    ts, model='multiplicative', 
    period=12)
seasonal, trend, residual = decomposition.seasonal, decomposition.trend, decomposition.resid

plt.figure(figsize=(5, 5))
plt.subplot(411)
plt.plot(ts, 'r', label='Original')
plt.legend()
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend()
plt.subplot(413)
plt.plot(seasonal, label='Seasonal')
plt.legend()
plt.subplot(414)
plt.plot(residual, label='Residual')
plt.legend()
plt.tight_layout()
plt.show()

Real-world Examples

Nivdia stock price

Real-world Examples

Nivdia stock price

Log-transformation

Real-world Examples

Nivdia stock price

ACF Plot

Slow decay in the ACF (e.g., high autocorrelation at high lags) confirms a strong trend (non-stationarity).
No repeating spikes at fixed lags (e.g., Lag 12 for monthly data) implies No evidence of seasonality.
Stock prices are usually every complex and cannot be precisely described.

Real-world Examples

Nivdia stock price

Decompositions

Introduction to Timeseries Analysis

Outline

Introduction & Motivation

Introduction & Motivation

Non- vs timeseries data (Gapminder)

Non-timeseries data (2002)

Timeseries data (Cambodia)

Introduction & Motivation

Non- vs timeseries data (Gapminder)

Non-timeseries data (2002)

Timeseries data (Cambodia)

Introduction & Motivation

Definition

Motivation

Visualization & Statistical Values

Visualization & Statistical values

Visualization

Visualization & Statistical values

Statistical values: Autocorrelation

Visualization & Statistical values

Visualization: Correlogram/ACF plot

Visualization & Statistical values

Visualization: Correlogram/ACF plot

Visualization & Statistical values

Visualization: Correlogram/ACF plot

Timeseries Main Components & Decompositions

Main components & Decompositions

Main components & Decompositions

Real-world Examples

Real-world Examples

Nivdia stock price

Real-world Examples

Nivdia stock price

Log-transformation

Real-world Examples

Nivdia stock price

ACF Plot

Real-world Examples

Nivdia stock price

Decompositions

🥳 Yeahhhh….

Let’s Party… 🥂