Information

Author(s) Pierre Dupont, Benoit Ronval
Deadline 12/04/2026 23:00:00
Abgabenlimit No limitation

Einloggen

A4.1 - Deep Learning: a theoretical example

This task will partially be graded after the deadline

  • You will get a real-time feed-back for questions 1 to 3.
  • They do not count for the final grade of the task, but you need to answer them correctly to answer the remaining questions (or else they will not be graded).
  • Questions 4 to 15 will be graded after the deadline.

During the second assignment, you have seen that the XOR problem is not linearily separable in the input space. This problem however becomes separable when mapped to a new space using an appropriate kernel. An alternative way to tackle this problem is using a neural network. The goal of this task is to train such a network in order to solve the 2D XOR problem, defined by the following 4 data points:

\begin{equation*} \mathbf{x}_1 = \left[\begin{matrix} 0 \\ 0 \end{matrix}\right],\ y_1 = 0 \qquad \qquad \mathbf{x}_2 = \left[\begin{matrix} 0 \\ 1 \end{matrix}\right],\ y_2 = 1 \qquad \qquad \mathbf{x}_3 = \left[\begin{matrix} 1 \\ 0 \end{matrix}\right],\ y_3 = 1 \qquad \qquad \mathbf{x}_4 = \left[\begin{matrix} 1 \\ 1 \end{matrix}\right],\ y_4 = 0 \end{equation*}
https://inginious.info.ucl.ac.be/course/LINFO2262/A4-1/space.png

We aim to produce a neural network that is able to classify accurately those four data points (in other words, we will test our classifier on this training data).

The neural network we will use contains one hidden layer with 2 nodes:

https://inginious.info.ucl.ac.be/course/LINFO2262/A4-1/network.png

The parameters of our hidden layer are described by the weight matrix and the bias vector:

\begin{equation*} \mathbf{W}_{\mathbf{x} \rightarrow \mathbf{h}} = \left[ \begin{matrix} w_{x_1 \rightarrow h_1} & w_{x_1 \rightarrow h_2} \\ w_{x_2 \rightarrow h_1} & w_{x_2 \rightarrow h_2} \\ \end{matrix} \right] \qquad\qquad \mathbf{b}_{\mathbf{h}} = \left[ \begin{matrix} b_{h_1} \\ b_{h_2} \\ \end{matrix} \right] \end{equation*}

For the output layer, we have the weight vector \(\mathbf{w}_{\mathbf{h} \rightarrow y} = \left[\begin{matrix} w_{h_1 \rightarrow y} \\ w_{h_2 \rightarrow y} \end{matrix}\right]\) and the bias \(b_y\).

The non-linear activation function of the hidden layer is a ReLU, while the output layer includes a sigmoid function. Throughout this task, we will use a learning rate \(\eta = 0.4\) and no regularization (\(\lambda = 0\)).


Question 1: Forward propagation

Suppose we have the following assignment of the parameters:

\begin{equation*} \mathbf{W}_{\mathbf{x} \rightarrow \mathbf{h}} = \left[ \begin{matrix} 0.65 & 1.35 \\ 0.9 & 1.05 \\ \end{matrix} \right] \qquad \mathbf{b}_{\mathbf{h}} = \left[ \begin{matrix} 0.75 \\ -1.95 \\ \end{matrix} \right] \qquad \mathbf{w}_{\mathbf{h} \rightarrow y} = \left[ \begin{matrix} 0.95 \\ -2.67 \\ \end{matrix} \right] \qquad b_y = -0.4 \end{equation*}

Propagate each example \(\mathbf{x}_i\) through the neural network. What are the values of the outputs \(\hat{y_i}\)?

Give your answer using the format: y_1, y_2, y_3, y_4

When rounding, give at least 3 decimals.

Question 2: Classification

How many examples are correctly classified with the current network?

Question 3: Loss

What is the value of the cross-entropy loss \(L(\mathbf{\hat{y}},\mathbf{y})\) computed from the 4 examples using this network? This cross-entropy loss is the cost function \(J\) of the network, which depends on the network parameters. Report below the contribution of the first example \((\hat{y}_1, y_1)\) to this loss and the total loss \(L(\mathbf{\hat{y}},\mathbf{y})\) computed from the 4 examples.

Note. In the lecture slides, \(\log\) denotes the natural logarithm (i.e. \(\ln\)).

Give your answer using the format: L_1, L

When rounding, give at least 3 decimals.

Question 4: Back-propagation

In order to update the weights of our model, we will back-propagate one example, namely \(\mathbf{x}_1 = \left[\begin{matrix} 0 \\ 0 \end{matrix}\right],\ y_1 = 0\).

First, we need to compute the output layer gradient \(\nabla_{\hat{y}} J\). What is its value when considering this example ?

When rounding, give at least 3 decimals.

Question 5: Back-propagation (continued)

Consequently, what are the values of \(\nabla_{\mathbf{w}_{\mathbf{h} \rightarrow y}} J\) and \(\nabla_{b_{y}} J\), i.e. the gradients of the weight vectors and the bias of the last layer?

Give your answer using the format: grad_w_1, grad_w_2, grad_b

When rounding, give at least 3 decimals.

Question 6: Back-propagation (continued)

Next, we need to back-propagate the gradient to the hidden layer.

What is the value of \(\nabla_{\mathbf{h}} J\)?

Give your answer using the format: grad_h_1, grad_h_2

When rounding, give at least 3 decimals.

Question 7: Back-propagation (continued)

What are the values of \(\nabla_{\mathbf{W}_{\mathbf{x} \rightarrow \mathbf{h}}} J\) and \(\nabla_{b_\mathbf{h}} J\), i.e. the gradients of the weight matrix and the bias vector of the first layer?

Give your answer using the format: grad_w_11, grad_w_12, grad_w_21, grad_w_22, grad_b_1, grad_b_2

When rounding, give at least 3 decimals.

Question 8: Weight update

Using your previous answers, you can update the weights of the neural network. What are the new parameters \(\mathbf{w}_{\mathbf{h} \rightarrow y}\) and \(b_{y}\) of the last layer?

Give your answer using the format: w_1, w_2, b

When rounding, give at least 3 decimals.

Question 9: Weight update (continued)

What are the new parameters \(\mathbf{W}_{\mathbf{x} \rightarrow \mathbf{h}}\) and \(\mathbf{b}_\mathbf{h}\) of the first layer?

Give your answer using the format: w_11, w_12, w_21, w_22, b_1, b_2

When rounding, give at least 3 decimals.

Note: be careful with the indexing for the w_ij values. It must match the original indexing given at the start of the task.

Question 10: Forward propagation

Now, we will evaluate whether our model has been improved thanks to the back-propagation.

Propagate each example \(\mathbf{x}_i\) through the neural network.

How are these examples (approximately) represented in the hidden space?

Question 11: Forward propagation (continued)

Are they linearly separable?

Question 12: Forward propagation (continued)

What are the values of the outputs \(\hat{y_i}\)?

Give your answer using the format: y_1, y_2, y_3, y_4

When rounding, give at least 3 decimals.

Question 13: Classification

How many examples are correctly classified with the updated network?

Question 14: Loss

What is the cross-entropy loss with the updated network? Report below the contribution of the first example \((\hat{y}_1, y_1)\) to this new loss and the total new loss \(L(\mathbf{\hat{y}},\mathbf{y})\) computed from the 4 examples.

Give your answer using the format: L_1, L

When rounding, give at least 3 decimals.

Question 15: Conclusion

Check all the valid affirmations.

Question 16: Public Resources
If ever you have been using public resources to answer some questions above, please quote all used resources below.

If ever such a resource is a generative AI, please share some link(s) to the chat(s) (e.g. https://chatgpt.com/share/698dac4c-f32c-800e-af13-f9ee2ebfdb0b)
You can generate such link(s) by clicking on "share" or "partager".

Otherwise, specify None in the field below.