A4.1 - Deep Learning: a theoretical example

This task will partially be graded after the deadline

You will get a real-time feed-back for questions 1 to 3.

They do not count for the final grade of the task, but you need to answer them correctly to answer the remaining questions (or else they will not be graded).

Questions 4 to 15 will be graded after the deadline.

During the second assignment, you have seen that the XOR problem is not linearily separable in the input space. This problem however becomes separable when mapped to a new space using an appropriate kernel. An alternative way to tackle this problem is using a neural network. The goal of this task is to train such a network in order to solve the 2D XOR problem, defined by the following 4 data points:

\begin{equation*} \mathbf{x}_1 = \left[\begin{matrix} 0 \\ 0 \end{matrix}\right],\ y_1 = 0 \qquad \qquad \mathbf{x}_2 = \left[\begin{matrix} 0 \\ 1 \end{matrix}\right],\ y_2 = 1 \qquad \qquad \mathbf{x}_3 = \left[\begin{matrix} 1 \\ 0 \end{matrix}\right],\ y_3 = 1 \qquad \qquad \mathbf{x}_4 = \left[\begin{matrix} 1 \\ 1 \end{matrix}\right],\ y_4 = 0 \end{equation*}

https://inginious.info.ucl.ac.be/course/LINFO2262/A4-1/space.png

We aim to produce a neural network that is able to classify accurately those four data points (in other words, we will test our classifier on this training data).

The neural network we will use contains one hidden layer with 2 nodes:

https://inginious.info.ucl.ac.be/course/LINFO2262/A4-1/network.png

The parameters of our hidden layer are described by the weight matrix and the bias vector:

\begin{equation*} \mathbf{W}_{\mathbf{x} \rightarrow \mathbf{h}} = \left[ \begin{matrix} w_{x_1 \rightarrow h_1} & w_{x_1 \rightarrow h_2} \\ w_{x_2 \rightarrow h_1} & w_{x_2 \rightarrow h_2} \\ \end{matrix} \right] \qquad\qquad \mathbf{b}_{\mathbf{h}} = \left[ \begin{matrix} b_{h_1} \\ b_{h_2} \\ \end{matrix} \right] \end{equation*}

For the output layer, we have the weight vector \(\mathbf{w}_{\mathbf{h} \rightarrow y} = \left[\begin{matrix} w_{h_1 \rightarrow y} \\ w_{h_2 \rightarrow y} \end{matrix}\right]\) and the bias \(b_y\).

The non-linear activation function of the hidden layer is a ReLU, while the output layer includes a sigmoid function. Throughout this task, we will use a learning rate \(\eta = 0.4\) and no regularization (\(\lambda = 0\)).

Author(s)	Pierre Dupont, Benoit Ronval
Deadline	12/04/2026 23:00:00
Abgabenlimit	No limitation

Information

Einloggen