A2.1 - Linear Discriminants and SVMs: theory - [LINFO2262] Machine Learning: classification and evaluation

Question 1: Separating hyperplanes

We consider a binary classification problem in $\mathbb{R}^2$ . The training set is made of 5 examples described in the table below. Each training example is represented by one row in this table. The two first columns give the coordinates of each example while the last column is the class label. We are interested in a SVM classifier built from this training set. We consider in particular the maximal margin hyperplane solution in the original input space.

$x_1$	$x_2$	Class label
0	3	$+$
3	0	$-$
2	1	$-$
0	2	$+$
3	3	$+$

Represent graphically the 5 training examples in $\mathbb{R}^2$ and compute the maximal margin hyperplane. You are not supposed to compute this decision boundary by solving the full optimization problem on paper. You are supposed to reason geometrically in order to determine the solution.

What is the equation of the maximal margin hyperplane (this can also be computed easily from the geometry of the problem)?

Use the same format as in the following example (mind the spaces, and round to 1 decimal each coefficient): + 1.0 x1 - 5.0 x2 - 6.0 = 0

Question 2: Separating hyperplanes (continued)

For the classifier defined in the previous question, what is the predicted class of the following test example?

$\begin{equation*} x_1 = 1 \qquad x_2 = 2 \end{equation*}$

$-$

$+$

Question 3: Separating hyperplanes (continued)

Which are the support vectors for the model estimated in question 1?

$(3,3)$

$(0,2)$

$(3,0)$

$(0,3)$

$(2,1)$

Question 4: Separating hyperplanes (continued)

Which of the following hyperplanes perfectly separate the training data without possibly being the maximal margin solution?

Select all correct answers.

$2 x_1 - x_2 - 2 = 0$

$x_1 = 1.5$

$2 x_1 - x_2 - 4 = 0$

$x_2 = 1.5$

$2 x_1 + 3 x_2 - 8 = 0$

$x_1 - 2 x_2 + 1 = 0$

Question 5: Separating hyperplanes (continued)

Which additional training example(s) would result in the two classes no longer being separable by a hyperplane?

Here you should consider such additional training example one at a time, on top of the original 5 training examples from question 1.

Select all good answers.

$(-1,0)$ class $+$

$(2,0)$ class $+$

$(4,3)$ class $+$

$(2,0)$ class $-$

$(1,2)$ class $-$

$(-1,0)$ class $-$

$(1,2)$ class $+$

$(4,3)$ class $-$

Question 6: The kernel trick

We consider an input space made of real vectors in $\mathbb{R}^2$ (two-dimensional real vectors) and a polynomial kernel $k : \mathbb{R}^2 \times \mathbb{R}^2 \rightarrow \mathbb{R}$ defined as follows:

$\begin{equation*} k(\mathbf{x}_i, \mathbf{x}_j) = (\langle \mathbf{x}_i, \mathbf{x}_j \rangle + 1)^3 \quad \text{with} \quad \mathbf{x}_i, \mathbf{x}_j \in \mathbb{R}^2 \end{equation*}$

What mathematical projection $\phi$ would map an input vector $\mathbf{x} = \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right]$ to this new feature space?

Select all the elements that are equal to a dimension of $\phi(\mathbf{x})$ .

$\sqrt{3}x_2^2$

$\sqrt{3}x_1^2$

$\sqrt{12}x_1^2x_2^2$

$\sqrt{3}x_2$

$\sqrt{3}x_1^2x_2$

$x_1^3$

$\sqrt{3}x_1$

$\sqrt{6}x_1^2x_2$

$\sqrt{6}x_1x_2$

$\sqrt{3}x_1x_2^2$

$\sqrt{6}x_1x_2^2$

1

$x_2^3$

Question 7: The kernel trick (continued)

Consider the 2D XOR problem:

$\begin{equation*} \mathbf{x}_1 = \left[\begin{matrix} -1 \\ -1 \end{matrix}\right],\ y_1 = 0 \qquad \qquad \mathbf{x}_2 = \left[\begin{matrix} -1 \\ 1 \end{matrix}\right],\ y_2 = 1 \qquad \qquad \mathbf{x}_3 = \left[\begin{matrix} 1 \\ -1 \end{matrix}\right],\ y_3 = 1 \qquad \qquad \mathbf{x}_4 = \left[\begin{matrix} 1 \\ 1 \end{matrix}\right],\ y_4 = 0 \end{equation*}$

Select all valid affirmations.

All of the dimensions of the new feature space (from Q6) are needed to make the data linearly separable.

With an appropriate kernel, the data can be mapped to a 1-dimensional feature space where it is linearily separable.

Some of the dimensions of the new feature space (from Q6) are not contributing to make the data linearily separable.

This set of examples is separable in the new feature space defined by the above kernel (from Q6).

This set of examples is linearly separable in the input space.

Question 8: The kernel trick (continued)

Consider the following kernel:

$\begin{equation*} k(\mathbf{x}_i, \mathbf{x}_j) = (\langle \mathbf{x}_i, \mathbf{x}_j \rangle)^3 \quad \text{with} \quad \mathbf{x}_i, \mathbf{x}_j \in \mathbb{R}^2 \end{equation*}$

What is the corresponding projection?

Select all the elements that are part of the projection.

$\sqrt{3} x_1^2 x_2$

$1$

$\sqrt{6}x_1x_2^2$

$\sqrt{3} x_2$

$\sqrt{3} x_1$

$\sqrt{3} x_2^2$

$\sqrt{3} x_1 x_2^2$

$x_1^3$

$\sqrt{3} x_1^2$

$\sqrt{6} x_1 x_2$

$x_2^3$

$\sqrt{12}x_1^2x_2^2$

$\sqrt{6}x_1^2x_2$

Question 9: The kernel trick (continued)

Is the XOR data from question 7 linearly separable in the feature space defined by this kernel?

Yes

No

Question 10: Non-linear discriminants

We consider the following training set of points represented in a one dimensional space $\mathbb{R}$ :

https://inginious.info.ucl.ac.be/course/LINFO2262/A2-1/q10.png

What is the form of a linear discriminant lying in such a 1D space $\mathbb{R}$ ?

Select all valid affirmations.

The linear discriminant $g(x) = x - 5.5$ classifies the input data as best as possible.

The data is linearly separable in the input space.

The best linear discriminant (without any kernel) has an accuracy of 80% on this data

The linear discriminant $g(x) = x - 3$ classifies the input data as best as possible.

The linear discriminant $g(x) = - x + 3$ classifies the input data as best as possible.

Question 11: Non-linear discriminants (continued)

Suppose one would consider the following kernel: $k(x_i, x_j ) = (\langle x_i, x_j \rangle + 1)^2$ and one would look for a maximal margin hyperplane in the feature space induced by this kernel. With a proper choice of the regularization constant ( $C = 100$ ), a solver for the dual problem on this training set returns the following $\alpha$ values, according to the order of the training points in the above table (see question 10).

$\begin{equation*} \alpha_1 = 0 \qquad \alpha_2 = 2.5 \qquad \alpha_3 = 0 \qquad \alpha_4 = 7.333 \qquad \alpha_5 = 4.833 \end{equation*}$

What is the equation of the discriminant function $g(x)$ (in the input space) which is used by this model to classify any new point $x \in \mathbb{R}$ .

Use the same format as in the following example (mind the spaces, and round to 2 decimals each coefficient): + 2.54 x^2 - 5.45 x - 6.67

Question 12: Non-linear discriminants (continued)

Use the discriminant function found in the previous question on the 1D data.

Select all valid affirmations

The discriminant perfectly separates the data.

The discriminant corresponds to 2 decision boundaries in the 1D space.

The decision boundaries, given by the roots of the discriminant, are approximately $x = 2.42$ and $x = 5.58$ .

The discriminant is a convex function.

Author(s)	Pierre Dupont, Benoit Ronval
Deadline	09/03/2025 23:00:00
Submission limit	No limitation

Information

Sign in