Loading [MathJax]/jax/output/HTML-CSS/jax.js

Thông tin

Tác giả Pierre Dupont, Benoit Ronval
Hạn chót 23/02/2025 23:00:00
Giới hạn nộp bài Không có giới hạn

Đăng nhập

A1.1 - Decision Tree Learning: theory

This task will be graded after the deadline


Câu hỏi 1: Attribute selection

Suppose you have the following training set with four boolean attributes x1, x2, x3 and x4 and a boolean output y.

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_B.png

What is the tree learned by CART (without any pruning mechanism) from this training set?

You should be able to construct it from your general understanding of this algorithm without going into all the details of computing explicitly every step of this algorithm.

Câu hỏi 2: Attribute selection (continued)

Is there another binary decision tree which would perfectly classify the same training examples and would not be as deep as the one proposed by CART?

Câu hỏi 3: Drop of impurity

Suppose you have a training set with 20 positive and 20 negative examples.

Compute the drop of impurity for the two following splits, performed at the root of the tree:

  • [4+,14] (left)  [16+,6] (right)
  • [8+,18] (left)  [12+,2] (right)

Give your answer in the following format: drop_first, drop_second

When rounding, give at least 3 decimals. For example, 0.452.

Câu hỏi 4: Drop of impurity (continued)

Based on your answer in the previous question, which split will be chosen by CART?

Beware: you will only receive credit for this question if you answered the previous one correctly.

Câu hỏi 5: Drop of impurity (continued)

Suppose now that the mistakes on the positive examples are about 5 times as costly as mistakes to the negative. One way of dealing with such a cost imbalance is to replicate the positive examples 5 times each. The splits become:

  • [20+,14] (left)  [80+,6] (right)
  • [40+,18] (left)  [60+,2] (right)

What is their drop of impurity?

Give your answer in the following format: drop_first, drop_second

When rounding, give at least 3 decimals. For example, 0.452.

Câu hỏi 6: Drop of impurity (continued)

Consequently, which split will be performed by CART?

Beware: you will only receive credit for this question if you answered the previous one correctly.

Câu hỏi 7: Continuous attributes

Consider a classification problem based only on 2 continuous attributes (the instance space is the plane R2). CART incorporates these attributes by defining threshold based boolean attributes. In the induced tree, each node corresponds to a particular decision boundary splitting the examples into two regions. What is the shape (in the instance space) of the decision boundaries learned by CART? Into how many regions is the instance space divided before pruning? Does it depend on the attribute values of the training examples? Does it depend on the number of classes?

Select all valid sentences