Advanced Machine Learning


     

Lecturer: Dr. HAS Sothea

Code: AMSI61AML

Deep Neural Network

Content

  • Introduction & Brief History

  • World of Approximation

  • Neural Networks

  • Optimization

  • Applications

Introduction

Deep Neural Networks (DNN) or Multilayer Perceptron (MLP) is a type of ML model built to simulate the complex decision-making power of the human brain 🧠.

It is a backbone that powers the recent development of Artificial Intelligence (AI) applications in our lives today.

History

Early Foundations
Year Development
1943 Walter Pitts and Warren McCulloch created the first computer model based on neural networks, using β€œthreshold logic” to mimic the thought process.
1960s Henry J. Kelley developed the basics of a continuous backpropagation model, and Stuart Dreyfus simplified it using the chain rule.
Development of Algorithms
1965 Alexey Ivakhnenko and Valentin Lapa developed early deep learning algorithms using polynomial activation functions.
1980s Geoffrey Hinton1 and colleagues revived neural networks by demonstrating effective training using backpropagation
AI Winters and Resurgence
1970s The first AI winter occurred due to unmet expectations, leading to reduced funding and research.
1980s Despite the AI winter, research continued, leading to significant advancements in neural networks and deep learning.
Modern Era
1990s Development of convolutional neural networks (CNNs) by Yann LeCun and others for image recognition.
2006 Geoffrey Hinton and colleagues introduced deep belief networks, which further advanced deep learning techniques.
2012 AlexNet, a deep convolutional neural network, won the ImageNet competition, showcasing the power of deep learning in computer vision.
2016 AlphaGo by DeepMind defeated a human Go champion, demonstrating the potential of deep learning in complex games.
Present Deep learning continues to evolve, with applications in natural language processing, speech recognition, autonomous vehicles, and more.
Key Milestones
Year Key Model Development
1943 Pitts and McCulloch’s neural network model.
1960s Kelley’s backpropagation model and Dreyfus’s chain rule simplification.
1980s Hinton’s backpropagation revival & Recurrent Neural Networks (RNNs).
1990s LeCun’s Convolutional Neural Networks (CNNs).
2006 Deep belief networks.
2012 AlexNet’s ImageNet win.
2016 AlphaGo’s victory.
2017 Attention is all you need (key models of ChatGPT)

World of Approximations

Approximation

  • Approximation is the process of finding a value that is close to the true value of a quantity, but not exactly equal to it. It is often used when an exact value is difficult to obtain or not necessary.
  • Ex: In 1683, Jacob Bernoulli discovered \(e=2.718...\) from compound interest.

Suppose I put \(\$ 1\) into a saving account:

Interest Per Year N Compound Total
\(100\%\) 1 \(1+1\)
\(100\%\) 2 \((1+1/2)^2\)
\(100\%\) 3 \((1+1/3)^3\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(100\%\) n \((1+1/n)^n\)

The compounded interest \(\to e\) as \(n\) becomes very large i.e., \[\lim_{n\to \infty}\Big(1+\frac{1}{n}\Big)^n=e=2.71828182...\] With \(100\%\) interest per year calculated every second, my \(\$ 1\) yields nearly \(\$ e=\$ 2.71828...\) at the end of the year.

Approximation

Taylor expansion

If \(f:\mathbb{R}\to\mathbb{R}\) is infinitely differentiable \(f\in C^{\infty}\), i.e., \(f',f'',f''',...\) exist, then

\[\forall x,a\in\mathbb{R}: f(x)=\sum_{n=0}^{\infty}\frac{f^{(n)}(a)}{n!}(x-a)^n.\]

  • Example:
    • \(e^x=1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\frac{x^4}{4!}+\dots\)
    • \(\sin(x)=x-\frac{x^3}{3!}+\frac{x^5}{5!}-\frac{x^7}{7!}+\dots\)
    • \(\cos(x)=1-\frac{x^2}{2!}+\frac{x^4}{4!}-\frac{x^6}{6!}+\dots\)
  • Why does it matter?
  • In reality, even though interested in \(e^x\) we only compute \(x,x^2,x^3,...,x^p\) for some large enough degree \(p\in\mathbb{N}\).

Approximation

The Role of Models

  • As data/ML practitioners, we are interested in the relationship between input \(X\) and the target \(y\), called \(\color{red}{f}\).
  • Models approximate this relationship \(\color{red}{f}\).

\[\underbrace{\begin{bmatrix}x_{11} & x_{12} & \dots & x_{1d}\\ x_{21} & x_{22} & \dots & x_{2d}\\ x_{31} & x_{32} & \dots & x_{3d}\\ \vdots & \vdots & \ddots & \vdots\\ x_{n1} & x_{n2} & \dots & x_{nd}\\ \end{bmatrix}}_{\text{Input }X}\xrightarrow[]{\color{red}{f}} \underbrace{\begin{bmatrix}y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{bmatrix}}_{\text{target }y}\]

Model: Multilayer Perceptron

  • Deep Neural Networks (DNNs)/Multilayer Perceptrons (MLP) are computational models inspired by the human brain.

Model: Multilayer perceptron

  • Deep Neural Networks (DNNs)/Multilayer Perceptrons (MLP) are computational models inspired by the human brain.

  • Input layer: vector of individual inputs \(\text{x}_i\in\mathbb{R}^d\).
    • It takes the inputs from the dataset.
    • The inputs should be preprocessed: scaled, encoded, transformed, etc, before passing to this layer.

Model: Multilayer perceptron

  • Deep Neural Networks (DNNs)/Multilayer Perceptrons (MLP) are computational models inspired by the human brain.

  • Input layer: vector of individual inputs \(\color{green}{\text{x}_i}\in\mathbb{R}^d\).
    • It takes the inputs from the dataset.
    • The inputs should be preprocessed: scaled, encoded, transformed, etc, before passing to this layer.
  • Hidden layer: Governed by the equations:
    \[\begin{align*}\color{green}{z_0}&=\color{green}{\text{x}}\in\mathbb{R}^d\\ \color{green}{z_k}&=\sigma_k(\color{blue}{W_k}\color{green}{z_{k-1}}+\color{blue}{b_k})\text{ for }k=1,...,L-1. \end{align*}\] where,
    • \(\color{blue}{W_k}\) is a matrix of size \(\ell_{k}\times\ell_{k-1}\)
    • \(\color{blue}{b_k}\) is a bias vector of size \(\ell_k\)
    • \(\sigma_k\): is a point-wise nonlinear activation function.

Model: Multilayer perceptron

  • Deep Neural Networks (DNNs)/Multilayer Perceptrons (MLP) are computational models inspired by the human brain.

  • Input layer: vector of individual inputs \(\color{green}{\text{x}_i}\in\mathbb{R}^d\).
    • It takes the inputs from the dataset.
    • The inputs should be preprocessed: scaled, encoded, transformed, etc, before passing to this layer.
  • Hidden layer: Governed by the equations:
    \[\begin{align*}\color{green}{z_0}&=\color{green}{\text{x}}\in\mathbb{R}^d\\ \color{green}{z_k}&=\sigma_k(\color{blue}{W_k}\color{green}{z_{k-1}}+\color{blue}{b_k})\text{ for }k=1,...,L-1. \end{align*}\] where,
    • \(\color{blue}{W_k}\) is a matrix of size \(\ell_{k}\times\ell_{k-1}\)
    • \(\color{blue}{b_k}\) is a bias vector of size \(\ell_k\)
    • \(\sigma_k\): is a point-wise nonlinear activation function.
  • Output layer: Returns the predictions: \[\color{blue}{\hat{y}}=\sigma_L(\color{blue}{W_L}\color{green}{z_{L-1}}+\color{blue}{b_L}).\]

Model: Multilayer perceptron

  • Deep Neural Networks (DNNs)/Multilayer Perceptrons (MLP) are computational models inspired by the human brain.

  • Input layer: vector of individual inputs \(\color{green}{\text{x}_i}\in\mathbb{R}^d\).
    • It takes the inputs from the dataset.
    • The inputs should be preprocessed: scaled, encoded, transformed, etc, before passing to this layer.
  • Hidden layer: Governed by the equations:
    \[\begin{align*}\color{green}{z_0}&=\color{green}{\text{x}}\in\mathbb{R}^d\\ \color{green}{z_k}&=\sigma_k(\color{blue}{W_k}\color{green}{z_{k-1}}+\color{blue}{b_k})\text{ for }k=1,...,L-1. \end{align*}\] where,
    • \(\color{blue}{W_k}\) is a matrix of size \(\ell_{k}\times\ell_{k-1}\)
    • \(\color{blue}{b_k}\) is a bias vector of size \(\ell_k\)
    • \(\sigma_k\): is a point-wise nonlinear activation function.
  • Output layer: Returns the predictions: \[\color{blue}{\hat{y}}=\sigma_L(\color{blue}{W_L}\color{green}{z_{L-1}}+\color{blue}{b_L}).\]
  • Loss function: measures the difference between predictions and the real targets.

Model: Multilayer perceptron

  • Deep Neural Networks (DNNs)/Multilayer Perceptrons (MLP) are computational models inspired by the human brain.

  • Input layer: vector of individual inputs \(\color{green}{\text{x}_i}\in\mathbb{R}^d\).
    • It takes the inputs from the dataset.
    • The inputs should be preprocessed: scaled, encoded, transformed, etc, before passing to this layer.
  • Hidden layer: Governed by the equations:
    \[\begin{align*}\color{green}{z_0}&=\color{green}{\text{x}}\in\mathbb{R}^d\\ \color{green}{z_k}&=\sigma_k(\color{blue}{W_k}\color{green}{z_{k-1}}+\color{blue}{b_k})\text{ for }k=1,...,L-1. \end{align*}\] where,
    • \(\color{blue}{W_k}\) is a matrix of size \(\ell_{k}\times\ell_{k-1}\)
    • \(\color{blue}{b_k}\) is a bias vector of size \(\ell_k\)
    • \(\sigma_k\): is a point-wise nonlinear activation function.
  • Output layer: Returns the predictions: \[\color{blue}{\hat{y}}=\sigma_L(\color{blue}{W_L}\color{green}{z_{L-1}}+\color{blue}{b_L}).\]
  • Loss function: measures the difference between predictions and the real targets.

Model: Multilayer perceptron

Input Layer: sensory organs of the network

  • It plays a role as senses: πŸ‘€, πŸ‘‚, πŸ‘ƒ, πŸ‘…, πŸ‘Š …
  • The input data are directly fitted into input layer.
  • Let’s use our kaggle Abalone dataset.
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
392 51 1 2 110 175 0 1 123 0 0.6 2 0 2 1
960 52 0 2 136 196 0 0 169 0 0.1 1 0 2 1
888 60 0 0 150 258 0 0 157 0 2.6 1 2 3 0
741 41 0 2 112 268 0 0 172 1 0.0 2 0 2 1
287 71 0 1 160 302 0 1 162 0 0.4 2 2 2 1
  • Input: \(\text{x}_1=\) [52.0, 1.0, 0.0, 125.0, 212.0, 0.0, 1.0, 168.0, 0.0, 1.0, 2.0, 2.0, 3.0].
  • Target: \(y_1=\) 0.
  • Q1: What should be done in preprocessing step?
    • Remove missing values if there are any.
    • Encode cat. variables: OneHotEncoder.
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler 
data = data   # drop missing values
quan_vars = ['age','trestbps','chol','thalach','oldpeak']
qual_vars = ['sex','cp','fbs','restecg','exang','slope','ca','thal']
for i in quan_vars:
  data[i] = data[i].astype('float')
for i in qual_vars:
  data[i] = data[i].astype('category')
data = pd.get_dummies(data, columns=qual_vars, drop_first=True)  # One-hot encoding
y = data['target']
X = data.drop('target', axis=1) 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train-test split
scaler = StandardScaler() # Scaling inputs
X_train = scaler.fit_transform(X_train) 
X_test = scaler.transform(X_test)
  • Input: \(\text{x}_1=\) [-0.586, -0.779, -1.935, -1.019, -0.211, 0.655, -0.43, 1.606, -0.293, -0.414, -0.981, -0.122, -0.726, -0.959, 1.095, -0.523, -0.383, 3.625, -0.132, -0.263, 0.934, -0.814].
  • Target: \(y_1=\) 0.
  • Q2: What’s the size of the input layer?

Model: Multilayer perceptron

Input Layer: sensory organs of the network

  • Let’s build an MLP using Keras.
  • We first create Input Layer of size \(d=9\).
from sklearn.metrics import mean_squared_error 
from keras.models import Sequential 
from keras.layers import Dense, Input

# Dimension of the data
n, d = X_train.shape   # rows & columns

# Initiate the MLP model
model = Sequential()

# Add an input layer
model.add(Input(shape=(d,)))
  • Given trainable weights \(\color{blue}{W_1}\) of size \(\ell_1\times d\) and bias \(\color{blue}{b_1}\in\mathbb{R}^d\), the input \(\color{green}{\text{x}}\in\mathbb{R}^d\) is converted at the input layer by \[\begin{align*} \color{green}{z_1}&=\sigma_1(\color{blue}{W_1}\color{green}{\text{x}} + \color{blue}{b_1})\\ &=\sigma_1\begin{pmatrix} \color{blue}{\begin{bmatrix} w_{11} & w_{12} & \dots & w_{1d}\\ \vdots & \vdots & \ddots & \vdots\\ w_{\ell_11} & w_{\ell_12} & \dots & w_{\ell_1d}\\ \end{bmatrix}}\color{green}{\begin{bmatrix} x_1\\ \vdots\\ x_d \end{bmatrix}}+ \color{blue}{\begin{bmatrix} b_1\\ \vdots\\ b_{\ell_1} \end{bmatrix}} \end{pmatrix} \end{align*}\]

Model: Multilayer perceptron

Hidden/output Layer: brain 🧠/Action πŸƒπŸ»β€β™‚οΈβ€βž‘οΈ

  • Let’s add two hidden layers of sizes \(32\) to our existing network.
  • Then add an output layer to make real-valued prediction \(\color{blue}{\hat{y}}\) of Rings.
# Add hidden layer of size 32
model.add(Dense(32, activation='relu'))

# Add another hidden layer of size 32
model.add(Dense(32, activation='relu'))

# Add one last layer (output) of size 1
model.add(Dense(1, activation='sigmoid'))
  • With trainable weights \(\color{blue}{W_2, W_3}\) and biases \(\color{blue}{b_2,b_3}\), the feedforward path: \[\begin{align*} \color{green}{z_2}&=\sigma_2(\color{blue}{W_2}\color{green}{z_1} + \color{blue}{b_2})\in\mathbb{R}^{32}\\ \color{blue}{\hat{y}}&=\sigma_4(\color{blue}{W_3}\color{green}{z_2} + \color{blue}{b_2})\in\mathbb{R} \end{align*}\]
  • What is the dimension of each parameter?

Model: Multilayer perceptron

Activation functions: \(\sigma(.)\) β•­β•―

  • In feedforward path, we use matrix multiplications (\(\color{blue}{W_j}\)’s) and additions (\(\color{blue}{b_j}\)’s).
  • These operations are linear.
  • Without non-linear components, the network is just a linear regression.
  • These non-linear functions are called activation functions.
  • It’s an important component that makes the networks powerful!
  • Types of activation functions \(\sigma_j(.)\):

\[\begin{align*} \text{Sigmoid}(z)&=1/(1+e^{-z})\text{ for }z\in\mathbb{R}\\ \text{Softmax}(z)&=(e^{z_1},\dots,e^{z_d})/\sum_{k=1}^de^{z_k},\text{ for }z\in\mathbb{R}^d\\ \color{red}{\text{ReLU}(z)}&\color{red}{=\max(0,z)\text{ for }z\in\mathbb{R}}\\ \text{Tanh}(z)&=\tanh(z)\text{ for }z\in\mathbb{R}\\ \text{Leaky ReLU}(z)&=\begin{cases}z,&\mbox{if} z>0\\ \alpha z,&\mbox{if }z\leq 0\end{cases}. \end{align*}\]

Model: Multilayer perceptron

Loss function: true \(y\) vs prediction \(\color{blue}{\hat{y}}\)

  • Given weights \(\color{blue}{W_j}\)’s and biases \(\color{blue}{b_j}\)’s of the network, the feedforward network can produce prediction \(\hat{y}\).
  • To measure how good the network is, we compare the prediction \(\color{blue}{\hat{y}}\) to the real target \(y\).
  • Loss function quantifies the difference between the predicted output and the actual target.
  • Regression losses:
    • \(\ell_2(y_i,\color{blue}{\hat{y}_i})=(y_i-\color{blue}{\hat{y}_i})^2\): Squared loss.
    • \(\ell_1(y_i,\color{blue}{\hat{y}_i})=|y_i-\color{blue}{\hat{y}_i}|\): Absolute loss.
    • \(\ell_{\text{rel}}(y_i,\color{blue}{\hat{y}_i})=|\frac{y_i-\color{blue}{\hat{y}_i}}{y_i}|\): Relative loss.
  • Classification losses:
    • \(\text{CEn}(y_i,\color{blue}{\hat{y}_i})=-\sum_{j=1}^My_{ij}\log(\color{blue}{\hat{y}_{ij}})\): Cross-Entropy.
    • \(\text{Hinge}(y_i,\color{blue}{\hat{y}_i})=\max\{0,1-\sum_{j=1}^My_{ij}\color{blue}{\hat{y}_{ij}}\}\): Hinge loss.
    • \(\text{KL}(y_i,\color{blue}{\hat{y}_i})=\sum_{j=1}^My_{ij}\log(y_{ij}/\color{blue}{\hat{y}_{ij}})\): Kullback-Leibler (KL) Divergence.
  • Q3: What are the key parameters of the network?
  • A3: All weights \(\color{blue}{W_j}\)’s and biases \(\color{blue}{b_j}\)’s.
  • Q4: How to find the suitable values of these parameters?
  • A4: Loss function can guide the network to its better and better state! In other words, we can use the loss/mistake to adjust all key parameters, leading to a better state of the network.

Model: Multilayer perceptron

Feedforward Neural Networks By Hand

πŸ‘‰ Jupyter notebook: Feedforward NN by hand.

Model: Multilayer perceptron

Why is it powerful?

  • Roughly speaking, it can approximate any reasonably complex input-output relationship to any desired level of precision! (For more, read UAT, Deepmind).

Model: Multilayer perceptron

Why is it powerful?

Let’s see what it means: πŸ‘‰ Jupyter notebook: Universal Approximation Theorem.

Backpropagation: Gradient-based

Optimization in Keras

  • We set up optimization method for our existing network as follow:
# We use Adam optimizer
from keras.optimizers import Adam, SGD

# Set up optimizer for our model
model.compile(optimizer=SGD(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
  • Let’s have a look at your model:
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
β”‚ dense (Dense)                   β”‚ (None, 32)             β”‚           736 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_1 (Dense)                 β”‚ (None, 32)             β”‚         1,056 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_2 (Dense)                 β”‚ (None, 1)              β”‚            33 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 1,825 (7.13 KB)
 Trainable params: 1,825 (7.13 KB)
 Non-trainable params: 0 (0.00 B)

Training & Learning Curves

  • A few important hyperparameters:
    • batch_size: number of minibatch \(b\).
    • epochs: number of times that the network passes through the entire training dataset.
    • validation_split: a fraction of the training data for validation during model training. We can keep track of the model state during training by measuring the loss on this validation data, especially for preventing overfitting.
  • The network yields Test MSE \(=\) 0.941.
  • This is better than what we could achieve in Logistic Regression (see our TP2).
  • Tuning the hyperparameters would push its performance even further.
# Training the network
history = model.fit(X_train, y_train, epochs=200, batch_size=32, validation_split=0.1, verbose=0)

# Extract loss values 
train_loss = history.history['loss']
val_loss = history.history['val_loss'] 

# Plot the learning curves 
epochs = list(range(1, len(train_loss) + 1))
fig1 = go.Figure(go.Scatter(x=epochs, y=train_loss, name="Training loss"))
fig1.add_trace(go.Scatter(x=epochs, y=val_loss, name="Training loss"))
fig1.update_layout(title="Training and Validation Loss", 
                   width=510, height=250,
                   xaxis=dict(title="Epoch", type="log"),
                   yaxis=dict(title="Loss"))
fig1.show()

Diagnostics with Learning Curves

  • The above learning curve can be used to access the state of our model during and after training.
    • The training loss always decreases as it’s measured using the training data.
    • The drop of validation loss indicates the generalization capability of the model at that state.
    • The model starts to overfit the training data when the validation curve starts to increase.
    • We should stop the training process when we observe this change in validation curve.
  • The learning curves can also reveal other aspects of the network and the data including:
    • When the model underfit the data or requires more training epochs
    • When the learning rate (\(\eta\)) is too large
    • When the model cannot generalize well to validation set
    • When it converges properly
    • When the validation data is not representative enough
    • When the validation data is too easy too predict…
  • These are helpful resources for understanding the above properties:

Neural Network Playground

Summary

Pros

  • Versatility: DNNs can be used for a wide range of tasks including classification, regression, and even function approximation.
  • Non-linear Problem Solving: They can model complex relationships and capture non-linear patterns in data, thanks to their non-linear activation functions.
  • Flexibility: MLPs can have multiple layers and neurons, making them highly adaptable to various problem complexities.
  • Training Efficiency: With advancements like backpropagation, training MLPs has become efficient and effective.
  • Feature Learning: MLPs can automatically learn features from raw data, reducing the need for manual feature extraction.

Cons

  • Computational Complexity: They can be computationally intensive, especially with large datasets and complex architectures, requiring significant processing power and memory.
  • Overfitting: MLPs can easily overfit to training data, especially if they have too many parameters relative to the amount of training data.
  • Black Box Nature: The internal workings of an MLP are not easily interpretable, making it difficult to understand how specific decisions are made.
  • Requires Large Datasets: Effective training of MLPs often requires large amounts of data, which might not always be available.
  • Hyperparameter Tuning: MLPs have several hyperparameters (e.g., learning rate, number of hidden layers, number of neurons per layer) that need careful tuning, which can be time-consuming and challenging.
  • Architecture: Designing right architecture can be challenging as well.

πŸ₯³ It’s party time πŸ₯‚









πŸ“‹ View party menu here: Party 3 Menu.

🫠 Download party invitation here: Party 3 Invitation Letter.