Loading [MathJax]/jax/output/HTML-CSS/jax.js

מידע

יוצרים Pierre Dupont, Benoit Ronval
מועד הגשה 23/02/2025 23:00:00
מגבלת הגשות אין הגבלה

כניסה

A1.1 - Decision Tree Learning: theory

This task will be graded after the deadline


שאלה 1: Attribute selection

Suppose you have the following training set with four boolean attributes x1, x2, x3 and x4 and a boolean output y.

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_B.png

What is the tree learned by CART (without any pruning mechanism) from this training set?

You should be able to construct it from your general understanding of this algorithm without going into all the details of computing explicitly every step of this algorithm.

שאלה 2: Attribute selection (continued)

Is there another binary decision tree which would perfectly classify the same training examples and would not be as deep as the one proposed by CART?

שאלה 3: Drop of impurity

Suppose you have a training set with 20 positive and 20 negative examples.

Compute the drop of impurity for the two following splits, performed at the root of the tree:

  • [4+,14] (left)  [16+,6] (right)
  • [8+,18] (left)  [12+,2] (right)

Give your answer in the following format: drop_first, drop_second

When rounding, give at least 3 decimals. For example, 0.452.

שאלה 4: Drop of impurity (continued)

Based on your answer in the previous question, which split will be chosen by CART?

Beware: you will only receive credit for this question if you answered the previous one correctly.

שאלה 5: Drop of impurity (continued)

Suppose now that the mistakes on the positive examples are about 5 times as costly as mistakes to the negative. One way of dealing with such a cost imbalance is to replicate the positive examples 5 times each. The splits become:

  • [20+,14] (left)  [80+,6] (right)
  • [40+,18] (left)  [60+,2] (right)

What is their drop of impurity?

Give your answer in the following format: drop_first, drop_second

When rounding, give at least 3 decimals. For example, 0.452.

שאלה 6: Drop of impurity (continued)

Consequently, which split will be performed by CART?

Beware: you will only receive credit for this question if you answered the previous one correctly.

שאלה 7: Continuous attributes

Consider a classification problem based only on 2 continuous attributes (the instance space is the plane R2). CART incorporates these attributes by defining threshold based boolean attributes. In the induced tree, each node corresponds to a particular decision boundary splitting the examples into two regions. What is the shape (in the instance space) of the decision boundaries learned by CART? Into how many regions is the instance space divided before pruning? Does it depend on the attribute values of the training examples? Does it depend on the number of classes?

Select all valid sentences