Lab - Deep Learning

ITM-370: Data Analytics
Lecturer: HAS Sothea, PhD

Objective: The goal of this practical lab is to introduce you to building Multilayer Perceptron (MLP) models. After this, you will be familar with how to get start with model architecture, what are the influences of activation functions, learning rate, batchsize… We will try to understand all these concepts using the classic handwriten digit Mnist which can be imported from keras module.

The Jupyter Notebook for this Lab can be downloaded here: Lab_Deep_Learning.ipynb.

Or you can work with this notebook in Google Colab here: Lab_Deep_Learning.ipynb.

1. Importing `Mnist` dataset

You can import Mnist dataset from keras as follow.

from keras.datasets import mnist

(X_train, label_train), (X_test, label_test) = mnist.load_data()

# Visualize

import matplotlib.pyplot as plt

plt.imshow(X_train[0,:,:])
plt.axis("off")
plt.show()

A. What’s the dimension of X_train and X_test?

Can you guess what are label_train and label_test?
How many distinct values of these labels?

# To do

B. Visualize the first 8 images of the training data in two rows.

import matplotlib.pyplot as plt
# To do

C. Reshape the training data and testing data from 3 dimensional into 2 dimensional using x.reshape(), i.e., \(n\times 28\times 28\to n\times 784\) where each row is a \(784\)-dimensional vector of each image.

# To do

D. Scale the training and testing inputs pixels to take values between \(0\) and \(1\). This helps the training process converges quickly.

# To do

2. Model

Here, we build MLP model with the following architecture and hyperparameters:

A. Designing a network

Network layers: \([d, 32, 32, M]\) where is the input dimension. What are the values of \(d\) and \(M\)?
Activation function: ReLU and Softmax on the output layer.
Optimization method: Adam
Learning rate: \(0.001\)
Loss: ‘sparse_categorical_crossentropy’ to be minimized.
Metric: [‘accuracy’] is the metric to keep track the performance of the model during training process.
Minibatch size: \(b=32\) (subset used to approximate the full loss function when training)
Number of epochs: epchs = 50.
Validation size: 0.2 for keeping track the model state during training.

Let me set up all these for you.

from keras.models import Sequential
from keras.layers import Input, Dense


d = 784 # Input dimensions
M = 10 # Number of classes

model = Sequential()
# input layer
model.add(Input(shape=(d,)))

# Hidden layers
# Add hidden layer of size 32
model.add(Dense(32, activation='relu'))

# Add another hidden layer of size 32
model.add(Dense(32, activation='relu'))

# Add one last layer (output) of size 
model.add(Dense(10, activation='softmax'))

# Compiling the model: set up the optimization, loss and metric
from keras.optimizers import Adam, SGD
lr = 0.001
model.compile(
    optimizer=SGD(learning_rate=lr), 
    loss='sparse_categorical_crossentropy', 
    metrics=['accuracy'])

# Fit the model
b = 32
n_epoch = 50
val_size = 0.2
history = model.fit(X_train, label_train, epochs=n_epoch, batch_size=b, validation_split=val_size)

B. Learning curves can be extracted from the history object.

Create learning curves as shown in slide 29.
What information do you get from the curves?
Compute the testing accuracy.
Visualize images that are wrongly predicted.

# To do

C. The network’s architecture and hyperparameters were defined nearly at random. Now, it is your turn to try to study the impact of these values.

Increase the number of neurons within each hidden layers.
Change activation function.
Change the learning rate.
Change the optimization algorithm.
Change the minibatch size \(b\).

In each case, compute the test performance of the model.

# To do

1. Importing Mnist dataset

2. Model

Further readings

1. Importing `Mnist` dataset