A5 - A Machine Learning Competition - [LINFO2262] Machine Learning: classification and evaluation

A5 - A Machine Learning Competition

This final (and longer) project will give you the opportunity to have some fun while competing with other students to solve a real and difficult prediction task inspired by a biomedical research topic. Your job will be twofold:

build the best possible classification model on a training set to predict the (undisclosed) class labels on a given test set,
predict the actual classification performance your model will have on the test set.

Design choices

The "training" set is referred here in a broad sense, that is the fraction of the dataset on which the class labels are disclosed. It is up to you to decide what to do with this labeled set and, for instance, whether you split it (once or several times, possibly recursively) into actual training versus some validation fraction. More generally, this project is intended to be open as it is the case for a real task. Only the outcome matters! Many design choices are left open and you must specify them. Here is a non-exhaustive list of things you might have to consider.

Do you need to pre-process, to filter out, to normalize, etc., the available data? (look at the data,... LOOK AT THE DATA!)
How are you going to address the fact that some feature values could be numerical (either int or float) while others could be categorical or possibly boolean? Are there specific feature values only observed in the test but not in the training?
Should you consider all the available features? Should you define new or additional features from the existing ones? Should the newly defined features be crafted by hand, or generated automatically by other methods?
Which methodology are you going to use to learn a model, to fix its possible hyper-parameters and to predict its classification performance on the test set?
Which learning algorithm do you consider? You are free to choose the one you estimate more appropriate to the task. It needs not be (but it might be, of course) a learning algorithm that has been presented in the course. You can even consider several learning algorithms and/or produce and combine several models as long as your final prediction defines a unique label for each test example (ensemble methods? bagging?...).
What else?... The sky (and computing power) is the limit!

Competition performance metric

Since the dataset needs not be perfectly balanced in terms of class priors, the chosen test performance metric of a classifier is defined as the balanced classification rate (BCR). The BCR computes the classification rate for each class and reports the arithmetic average of those rates over all classes. In a binary classification context, BCR is simply the average between specificity and sensitivity. More generally, with \(C\) classes:

\begin{equation*} BCR = \frac{1}{C} \left(\sum_{i=1}^C \frac{TP_i}{n_i} \right) \end{equation*}

where:

\(n_i\) denotes the number of examples from class \(i\) included in the test set,
\(TP_i\) denotes the number of correctly classified examples from class \(i\) in the test set,
\(C\) denotes the number of classes.

Note that \(\frac{TP_i}{n_i}\) is also denoted below as \(p_i\) and simply refers to the proportion of correctly classified test examples from class \(i\).

Note also that a trivial classifier predicting all examples as belonging to the same class has a test \(BCR = \frac{1}{C}\), no matter how the class priors are distributed (whenever the single predicted class is never actually observed in the test set, BCR = 0 assuming \(\frac{0}{0}\) = 0.), while a perfect classifier has \(BCR=1\) (what is the expected BCR of a classifier predicting uniformly at random among the classes? Does it depend on the class priors?).

Since your task is not only to produce a model with the best possible BCR on the test set but also to predict how well your model is going to perform on this test set, we distinguish between the true test \(BCR\) of your model and your predicted test \(\widehat{BCR}\) on this test set. The actual competition performance metric \(P\) (according to which you will be ranked and graded) is as follows:

\begin{equation*} P = BCR - \Delta(BCR) \cdot \left[1-\exp \left( \frac{-\Delta(BCR)}{\sigma}~\right)\right] \end{equation*}

where:

\(\Delta(BCR) = | BCR - \widehat{BCR} |\)

\(\sigma = \frac {1} {C} \sqrt{\sum_{i=1}^C \frac{p_i(1-p_i)}{n_i}}\), where \(p_i\) is the proportion of correctly classified test examples from class \(i\) and \(n_i\) is the number of test examples from class \(i\).

This competition performance metric is directly inspired from the International Performance Prediction Challenge WCCI 2006. The larger \(P\) the better. To maximize \(P\) you need to get the best possible test \(BCR\) and to make sure that your predicted \(\widehat{BCR}\) is as close as possible to the actual test \(BCR\) of your model. Any deviation between both is penalized (by \(- \Delta(BCR)\)) while the influence of such penalty is limited in the region of uncertainty where \(\Delta(BCR)\) is commensurate with \(\sigma\), the error bar on your true test BCR.

Here is a typical plot of the performance metric \(P\) as a function of the predicted \(\widehat{BCR}\) for a true test \(BCR\) being equal to 70 %.

https://inginious.info.ucl.ac.be/course/LINFO2262/A5/compet_metric.png

The only thing you will need to provide for us to compute \(P\) is your predicted \(\widehat{BCR}\) and the predicted class labels of the test examples. From this and knowing the undisclosed true class labels, we will compute the actual \(BCR\) of your model, its estimated error bar \(\sigma\) and the resulting \(P\). The highest \(P\) among all submissions will define the winner of the competition (and earn our congratulations!).

Note that you will not know your \(P\) score before the closing of the competition, not even your rank among the other competitors. This is the game!

The prediction task

You are a data analyst and machine learning expert in a chemical lab and you have been tasked to study the toxicity of different chemical products based on the molecules that compose them and how present they are.

More precisely, your task is to decide if a given product is toxic (positive) or not toxic (negative).

https://inginious.info.ucl.ac.be/course/LINFO2262/A5/p5_2026_chemical.png

Generated with ChatGPT on 11th February 2026.

You are provided with a partially annotated dataset. The train split contains 3,000 examples and the test split contains 1,000 examples. Each feature corresponds to the quantity of the corresponding molecule in the product. They are obtained through a long, tidious, noisy, and difficult process (that is not relevant for this project).

Each data split includes 1,024 input float features.

The output class to be predicted:

The label variable (categorical) indicates the toxicity of the product: positive, negative.

The data, available on Moodle, is split into 3 files:

A5_2026_train.csv which contains the 1,024 input features of 3,000 examples to train and validate your models.

A5_2026_train_labels.csv which contains the corresponding classes of the 3,000 training (and validation) examples and their index.

A5_2026_test.csv which contains the 1,024 input features of 1000 test examples. It is your task to predict one class for each test example.

Once these files are stored on your local disk in the current working directory of a python session, the data can easily be loaded in pandas dataframes as follows.

import pandas as pd

x_train = pd.read_csv("A5_2026_train.csv")
y_train = pd.read_csv("A5_2026_train_labels.csv", index_col=0)
x_test = pd.read_csv("A5_2026_test.csv")

Paste this command into your terminal:

The password to connect is

Question 1: Prediction

Upload a .csv file containing your predictions on the test set. The first line is a comma (for the index column) and the name of column (i.e. label). Each following lines must contain the index (the corresponding line in the test set, [0, 999]) and the predicted class for the corresponding test example.

Here is an example of correct format:

,label
0,positive
1,positive
2,negative
3,negative
4,positive
...
999,negative

Assuming the predicted classes of the test examples are stored in a variable y_pred (that is, a pandas.Series with y_pred.name == 'label'), one can easily produce such a .csv file with the following code:

import pandas as pd

y_pred.to_csv('predictions.csv', index=True)

Max file size: 1.0 MiB
Allowed extensions: .csv

Question 2: Expected BCR

Give your expected BCR for the test set (range between 0 and 1).

For example: 0.5

Question 3: Code

Please copy-paste the code you used for this competition.

It must contain at least:

the complete code for training your final model, including how you pre-processed the data;

the code for predicting the test classes with your final model;

and how you have computed the expected test BCR.

Your code should execute without errors when downloaded and run locally (please use relative paths when loading your train or test data, e.g. ./A5_2026_test.csv).

Warning: unlike in previous assignments, Inginious does not execute your code at the time of submission, i.e. it will not yield any warning in case of syntactic or runtime errors.

Important note

Please add comments to your code to explain the different parts. You must also cite any public external resources (including AI tools, if any) you have been using. Add comments in this regard inside your code, by sharing a link to the AI chat (if any) you've been using. Copying the url to the AI chat does not work! To correctly share the conversation with AI tools, you should use the share option present in most of these tools and put the obtained link in a comment of your code.

# Note that, to be evaluated in this project, you **must** replace this comment by your own code
# that you have used to produce the answers to the previous questions.

Question 4: Report

Upload a report explaining your design choices, which algorithms/software you have used to perform the classification, and the exact protocol you followed to evaluate your model(s) and to produce your final prediction on the test set. Add any information you deem relevant (e.g. dealing with large number of features, comparison between approaches, etc.). Note that you can also include plots (e.g. cross-validated classification results when fine-tuning meta-parameters) but then you should analyze and comment the plots to tell the reader which conclusions you draw from it. You should target a report of 2 or 3 pages (about 1000 - 2000 words) of informative content.

Warning

Failure to submit an informative report would imply a penalty of up to 20 % of this project grade.

Max file size: 11.4 MiB
Allowed extensions: .pdf

Author(s)	Pierre Dupont, Benoît Ronval
Deadline	10/05/2026 23:00:00
Abgabenlimit	No limitation

Information

Einloggen