A4.2 - Deep Learning: feedforward networks - [LINFO2262] Machine Learning: classification and evaluation

A4.2 - Deep Learning: feedforward networks

During this assignment, you will build neural networks with the help of the Keras python library, which is build upon the TensorFlow library. There are 2 ways to use Tensorflow:

Install it on your machine using pip. We recommend TensorFlow version 2.20.0 (which is the version run by Inginious).

Use an interactive environment such as Kaggle Notebook (where TensorFlow is already installed).

If you do not have a local GPU or do not want to install tensorflow locally, we strongly advise you to use Kaggle, where you can access a GPU for 30 hours per week.

If you decide to use Kaggle, you first need to complete all of the following steps to gain access to the GPU:

Create a Kaggle account and log in.

Go to your account settings (icon in the upper right corner).

Add a display name and complete the phone verification process.

You can create a new notebook on Kaggle by cliking on Code (in the left panel), then New Notebook. Once in the notebook, verify that you are using the GPU by clicking on Settings then Accelerator, and making sure that the GPU is selected.

IMPORTANT: Do not forget to stop your session when you are not working on your code, otherwise you will continue to consume your GPU hours (30h/week)!

In the notebook, you can upload files (such as the dataset) by following these steps:

Select Upload in the right panel.

Choose a name for the uploaded file, and select Private for the visibility.

Click on Create at the bottom left.

The file should now be visible in the Input section. You can copy the path to this file by hovering over it and selecting Copy file path.

You can save any file created by your code (such as the fitted models) by following these steps:

Go to Save version -> Quick save (in version type).

In Advanced settings, choose wether to save the output always or for this version.

Clik on Save.

When you re-open the same notebook, you can add as input the previously saved files:

On the right panel, select Add Input and select the previously saved notebook.

The saved files should now be visible in the Input section.

IMPORTANT: the files created on Kaggle are TEMPORARY and will be deleted once you close the notebook! Check this documention for additional information about this, including alternative procedures.

Your task is to tackle an image recognition problem, for which deep learning has proven to be particularly effective. Your task at hand is to analyze Street View screenshots of house numbers, known as the SVHN dataset, from which we selected a sample of the images for this project. The goal is to predict which number, between 0 and 9, is contained at the center of each image. Each image is a \(32 \times 32 \times 3\) RGB image, and the dataset contains 50,000 train images and 10,000 test images. To make the task more challenging, the goal is to predict the number at the center of the image, but other numbers may be present in the image as well!

https://inginious.info.ucl.ac.be/course/LINFO2262/A4-2/house_numbers.png

The data can be downloaded from Moodle. Once downloaded in a local directory on your disk (or loaded in Kaggle), you can load the data using the following instructions:

import numpy as np
import tensorflow as tf
from tensorflow import keras

x_train = np.load("p4_2026_svhn_train.npz")["images"].transpose(3, 2, 0, 1)
y_train = np.load("p4_2026_svhn_train.npz")["labels"].squeeze()
x_test = np.load("p4_2026_svhn_test.npz")["images"].transpose(3, 2, 0, 1)
y_test = np.load("p4_2026_svhn_test.npz")["labels"].squeeze()

# Convert string labels as int, followed by one-hot encoding
labels = sorted(list(set(y_train)))
y_train = keras.utils.to_categorical([labels.index(x) for x in y_train])
y_test = keras.utils.to_categorical([labels.index(x) for x in y_test])

After executing the above instructions, x_train and x_test respectively contain the multi-dimensional arrays of the train and test images. The 3 last lines transform the labels (0, 1, ..., 9) contained in y_train and y_test to a binary array using one-hot encoding (i.e. label 0 is represented by the binary vector [1,0,0,0,0,0,0,0,0,0], label 1 by [0,1,0,0,0,0,0,0,0,0], etc.).

It is now time to build your first neural network, using the Sequential class of Keras.

Paste this command into your terminal:

The password to connect is

Question 1: A first linear neural network

For this question, we ask you to build and to train a neural network with the following specifications:

The network contains 2 layers:

A flatten layer which flattens the \(32 \times 32 \times 3\) 2D RGB images into 1D vectors of size \(3072\). This layer has no parameter to train, it just reshapes the input data.

A dense output layer with a softmax activation function such that it can predict the target categorical (=class) variable. The kernel and bias are initializers set to RandomNormal.

The network loss will be the categorical cross entropy loss. You should also specify that the categorical accuracy will be used as metric to evaluate the model.

The network optimizer is the Adam optimizer (an optimized version of the gradient descent procedure) with a learning rate of \(10^{-5}\).

We are here essentially training 10 linear models and then applying a softmax on them. This is not yet a deep neural network.

Implement your neural network in the variable model. Just define and compile the network, don't fit it on the training data (actually, you don't have access to the training set on Inginious, so this would generate an error).

# TODO replace by your own python code
pass

Question 2: A first linear neural network: model fitting

Fit your model from question 1 on the train data (with the default batch size, i.e. 32). Run 100 epochs to fit your model.

Once your neural network is fitted, save it in a .keras file using the save function of Keras and upload it below.

Max file size: 4.8 MiB
Allowed extensions: .keras

Question 3: A first linear neural network: performance

How many trainable parameters are contained in the whole network you just built? What is the measured test accuracy of the model you fitted in question 2?

Report your answer under the format: number_param, test_acc (use a decimal notation with at least 3 digits (ex. 0.748) for the accuracies, not %).

You must first validate the two previous questions before getting feeedback for this one. Note that this question depends on the specific model you uploaded in the previous question. If you change it, you may need to update your answer to this question accordingly.

Question 4: A non-linear network

Test question [15 points]: This question will be graded after the deadline. You will only receive credit for this question if you answered questions 1 to 3 correctly.

Build a new model, by adding a layer before the output layer of your neural net from question 1 (all other elements should be the same as in question 1). This additional layer must be a dense layer with a tanh activation function, and should contain 256 units. The kernel and bias are initialized to RandomNormal.

Implement you neural network in the variable model. Just define and compile the network, don't fit it on the training data.

# TODO replace by your own python code
pass

Question 5: A non-linear network: model fitting

Test question [10 points]

Fit your model from question 4 on the train data. Run 100 epochs to fit your model. Use the default batch size value as before (i.e. 32).

Once your neural network is fitted, save it in a .keras file using the save function of Keras with and upload it below.

Max file size: 23.8 MiB
Allowed extensions: .keras

Question 6: A non-linear network: performance

Test question [10 points]

How many trainable parameters are contained in the whole network you built in question 4? What is the measured test accuracy of the model as fitted in question 5?

Report your answer under the format: number_param, test_acc (use a decimal notation for the accuracies, not %).

Question 7: A non-linear network: activation functions

Test question [15 points]

Besides a tanh activation funtion, other non-linear functions can be implemented in a hidden layer. Let's consider the ReLU activation: use the exact same network architecture as in the previous question, but with ReLU instead of tanh activation in the hidden layer. Which one performs better?

Since there is a lot of randomness involved, different training runs for the same network might yield different results. To get more robust results, perform 10 distinct runs for each model and report the average test accuracies.

Train each model during 100 epochs with default batch size value (i.e. 32).

Report the mean validation accuracy of both networks using the format: tanh_mean_acc, relu_mean_acc (use a decimal notation, not %).

Question 8: Multiple choice

Test question [10 points]

Based on your observations from the previous questions, and possibly some extra experiments, what can you conclude?

Select all valid affirmations

The only introduced non-linearities in these neural networks come from the activation functions.

The ReLU activation function is linear.

With a sufficiently small learning rate, the categorical accuracy on the test set is guaranteed to increase after each epoch.

The kernel and bias initializers do not influence the final solution to question 4. The gradient descent always converges towards the same solution as the minimization problem is convex.

With a sufficiently small learning rate, the categorical accuracy on the train set is guaranteed to increase after each epoch.

It can be observed on these networks with a single hidden layer that the ReLU activation function and the tanh activation function tend to produce very similar test accuracies (averaged over 10 runs).

The learning rate does not influence a lot the learned neural network, it mainly influences the number of epochs until convergence.

Question 9: Hidden units

Test question [15 points] This question will be graded after the deadline.

Take the neural network with ReLU activation you just defined. Let's study the impact of the number of units in the hidden layer.

Compare the mean accuracy over 10 runs of this network with 256 units (which you already computed in question 7) to those obtained with models with 64 and 1024 hidden units.

Perform 10 distinct runs (training + testing) for each model and average the results. Use 100 epochs to fit each model with default batch size value (i.e. 32).

Don't change anything in your network besides the number of hidden units.

Report the mean validation accuracies of the three models using the format: mean_test_acc_64, mean_test_acc_256, mean_test_acc_1024

Question 10: Adding Hidden Layers

Test question [15 points]

We will now investigate whether we can improve the classification accuracy by adding extra layers to the model.

Compare (locally, on your machine) the models from question 9 to models having 2 or 3 hidden layers. For example, the (64, 256) configuration denotes a network with 64 units in the first hidden layer and 256 units in the second hidden layer. Evaluate the following configurations to determine which one has the best test accuracy:

(64, 64)
(64, 256)
(256, 64)
(256, 256)
(1024, 256)
(256, 1024)
(64, 64, 64)
(256, 256, 256)

Keep the same hyper-parameters (e.g. learning rate, ReLU non-linearities, ...) in your network and play only with the numbers of layers and of hidden units.

Hint: There are several configurations (in terms of number of hidden layers and number of hidden units), so it would take a large amount of computing to perform 10 runs for each one. First get some results by averaging performance on only 3 runs (but of course consider several epochs of training and, ideally, monitor convergence). Afterwards, you should perform 10 runs to discriminate between the most promising models trained on 100 epochs with default batch size value (i.e. 32).

Once you have identified the best configuration in terms of test accuracy, report your answer below using the following format: config, test_acc, num_params where

config is a configuration index (e.g. 0 for \((64, 64)\), 1 for \((64, 256)\), ...) as listed above.
test_acc is the test accuracy of this model trained on 100 epochs (use at least 3 digits in decimal format),
num_params is the number of trainable parameters of this model.

Example of a well-formatted (but not necessarily correct) answer, assuming \((256, 64)\) would be the best configuration: 2, 0.335, 123456.

Question 11: Adding Hidden Layers (continued)

Test question [10 points]

Based on the experiments you made for the previous questions, select all valid affirmations.

The test accuracy of the network with a single hidden layer and 1024 units cannot be improved by adding other hidden layers.

Considering deeper networks, by increasing the number of hidden layers (up to 3), tends to improve the test accuracy on this dataset.

Increasing the number of trainable parameters (by adding layers and/or number of hidden units) always leads to improve the test accuracy on this dataset.

The network with 2 hidden layers of 64 units performs better than the linear model of question 2.

The number of units of the first hidden layer influences the most the time taken by each training epoch because the input layer has a large number of units.

Neural networks with 2 hidden layers are universal approximators, which means that the test accuracy will converge to 100% given enough hidden units. We observe this phenomenon here as increasing the number of hidden units tend to increase the test accuracy.

Author(s)	Pierre Dupont, Benoit Ronval
Deadline	12/04/2026 23:00:00
Submission limit	No limitation

Information

Sign in