Information

Author(s) Pierre Dupont
Deadline 22/03/2026 23:00:00
Abgabenlimit No limitation

Einloggen

A3.1 - Performance Assessment: review quiz on probability and statistics

This task will be graded after the deadline


Question 1:

Let a decision tree model \(M\) be used to classify 1000 independent test examples. We consider the following events:

  • \(A\): \(M\) classifies the third test example as positive;
  • \(B\): \(M\) classifies the third test example as negative.

The model \(M\) is fixed and deterministic. In both cases \(A\) and \(B\), we consider the same third test example.

Select all valid answers.

Question 2:

A sigmoid kernel SVM is estimated on a validation set to have a 98% classification accuracy. When this model is run to classify a stream of new test examples, what is the expected number of examples to be processed until the first classification error appears (include the first error in the total number of examples required)?

Question 3:

How do you expect this number to vary according to the actual set of test examples? Report the standard deviation of your estimator.

When rounding, give at least 3 decimals.

Question 4:

What is the expected number of examples to be processed until the 7-th classification error is observed (include all 7 errors in the total number of examples required)?

Question 5: The Birthday Paradox
For simplicity, one assumes that each calendar year always includes 365 days and that everybody's birthday can fall on any day with the same probability.
If you meet a person at random in the street there is a \(1/365 \approx 0.3 \%\) chance that you share the same birthday (not necessarily from the same year).

What is the minimal number \(n\) of people one has to welcome in a room, such that the probability that at least 2 of them share the same birthday would be \(\ge 50 \%\)?

Note that the day a person is born is assumed to be an independent random event from any other birthday in the room. For instance, this is not a twins party!

A wrong reasoning would conclude that one needs to consider \(n \ge 183\) people since \(183/365 = 0.501 \ge 0.5\).

What is the correct minimal value for \(n\)?

Question 6: Normal Approximation to the Binomial Distribution

The Binomial Distribution defines the probability of \(k\) successes among \(n\) independent trials, with a probability of success of each trial fixed to \(p\).

Implement a function using the Normal Approximation to compute this Binomial probability.
To do so, you must rely on the implementation of the Normal Distribution included in scipy.
You will have to consider the fact that a normally distributed random variable is defined by a continuous density while the Binomial is a discrete distribution.
You may want to revisit the course slides in this regard.
Your function must return a float equal to the normal approximation of the Binomial probability.

Replace pass by your own Python code below.

Question 7: Public Resources
If ever you have been using public resources to answer some questions above, please quote all used resources below.

If ever such a resource is a generative AI, please share some link(s) to the chat(s) (e.g. https://chatgpt.com/share/698dac4c-f32c-800e-af13-f9ee2ebfdb0b)
You can generate such link(s) by clicking on "share" or "partager".

Otherwise, specify None in the field below.