A1.1 - Decision Tree Learning: theory - [LINFO2262] Machine Learning: classification and evaluation

This task will be graded after the deadline

Dán dòng lệnh này vào terminal của bạn:

Mật khẩu để kết nối là

Xin hãy trả lời tất cả các câu hỏi. Lỗi nội tại hệ thống {} has not a valid extension. {} is too heavy.

Câu hỏi 1: Attribute selection

Suppose you have the following training set with four boolean attributes $x_1$ , $x_2$ , $x_3$ and $x_4$ and a boolean output $y$ .

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_B.png

What is the tree learned by CART (without any pruning mechanism) from this training set?

You should be able to construct it from your general understanding of this algorithm without going into all the details of computing explicitly every step of this algorithm.

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_t1_B.png

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_t5_B.png

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_t4_B.png

https://inginious.info.ucl.ac.be/course/LINFO2262/A1-1/q1_t2_B.png

Câu hỏi 2: Attribute selection (continued)

Is there another binary decision tree which would perfectly classify the same training examples and would not be as deep as the one proposed by CART?

No, CART always finds the minimum-depth tree because it maximizes the information gain at each step.

Yes, for this example, there exists a smaller tree that perfectly classify all training examples. This tree is not found by CART since it is a greedy algorithm (the attribute selection at a node is done without regard to the selection at children nodes).

Yes, since CART never finds the minimum-depth tree, on any training set, because it is a greedy algorithm (the attribute selection at a node is done without regard to the selection at children nodes).

No, CART found the minimum-depth three for this training set, but it is a matter of chance. In general, CART does not always find the minimum-depth tree.

Câu hỏi 3: Drop of impurity

Suppose you have a training set with 20 positive and 20 negative examples.

Compute the drop of impurity for the two following splits, performed at the root of the tree:

$[4+, 14-]$ (left) $\ [16+, 6-]$ (right)

$[8+, 18-]$ (left) $\ [12+, 2-]$ (right)

Give your answer in the following format: drop_first, drop_second

When rounding, give at least 3 decimals. For example, 0.452.

Câu hỏi 4: Drop of impurity (continued)

Based on your answer in the previous question, which split will be chosen by CART?

Beware: you will only receive credit for this question if you answered the previous one correctly.

$[4+, 14-]$ (left) $\ [16+, 6-]$ (right)

$[8+, 18-]$ (left) $\ [12+, 2-]$ (right)

Câu hỏi 5: Drop of impurity (continued)

Suppose now that the mistakes on the positive examples are about 5 times as costly as mistakes to the negative. One way of dealing with such a cost imbalance is to replicate the positive examples 5 times each. The splits become:

$[20+, 14-]$ (left) $\ [80+, 6-]$ (right)

$[40+, 18-]$ (left) $\ [60+, 2-]$ (right)

What is their drop of impurity?

Give your answer in the following format: drop_first, drop_second

When rounding, give at least 3 decimals. For example, 0.452.

Câu hỏi 6: Drop of impurity (continued)

Consequently, which split will be performed by CART?

Beware: you will only receive credit for this question if you answered the previous one correctly.

$[20+, 14-]$ (left) $\ [80+, 6-]$ (right)

$[40+, 18-]$ (left) $\ [60+, 2-]$ (right)

Câu hỏi 7: Continuous attributes

Consider a classification problem based only on 2 continuous attributes (the instance space is the plane $\mathbb{R}^2$ ). CART incorporates these attributes by defining threshold based boolean attributes. In the induced tree, each node corresponds to a particular decision boundary splitting the examples into two regions. What is the shape (in the instance space) of the decision boundaries learned by CART? Into how many regions is the instance space divided before pruning? Does it depend on the attribute values of the training examples? Does it depend on the number of classes?

Select all valid sentences

The number of regions depends only on the number of classes.

The number of classes influences the shape (e.g. circle shape, rectangle shape, ...) of the regions.

The number of regions depends on the depth of the decision tree.

The instance space will be divided into four regions, as each attribute will be used to split the space in two.

The decision boundaries are parallel to the axes. Thus, the instance space is split into rectangles.

The number of regions does not depend on the attribute values of the training examples.

Tác giả	Pierre Dupont, Benoit Ronval
Hạn chót	23/02/2025 23:00:00
Giới hạn nộp bài	Không có giới hạn

Thông tin

Đăng nhập