Interpretation of accuracy score on subset of data points

Question

I have a multi-class problem that I am building a classifier for. I have N total data points I would like to predict on. If I instead predict on n < N data points and get an accuracy score, is there a way I can say (with some degree of confidence) what I think the same model’s accuracy score would be on the remaining data points?

Can somebody point me to an article discussing this, or suggest a formula to research?

accuracy classification

Can somebody point me to an article discussing this, or suggest a formula to research?

Eskapp · Answer

Usually when working with classification problems, one tries to have 3 subsets of data:

A training set: this subset is usually the biggest one and can take up to ~80% of the available data. It is used to train the chosen algorithm, using the known labels of each data sample.
A validation set: this subset is much smaller. It will typically be ~5-10% of the available data. It is used to evaluate the performance of the algorithm trained on the training set. Typically, one will tune the parameters of the algorithm in order to reach the best performance on the validation set.
A testing set: this subset is of the same size order as the validation set or bigger. Very important: it should NEVER be used for training purpose! Once the model is trained and tuned using the training and validation sets, the testing set allows one to get the accuracy (or any other performance metric) on unseen data. If the model generalizes well, the score will be close to the ones seen on the validation set, often times a tiny bit worse.

In order for this to work properly, it is important that all the subsets are representative of the available data. For example, that the proportion of each class is approximately the same across the subsets.

In the light of such broadly used process, we can see that most algorithms are tuned and tested on a fraction of the available data. As long as the used test set is balanced in a way that is similar to the training and validation sets and that it has not been used at all during the training/tuning of the model, there is no reason why the performance scores would not generalize well to N > n samples used in the test set.

Jay Speidell · Answer

Use cross validation. It's where you split the data in K subsets, and train and test K times on all of the data using a different subset of the data for validation each time. The average cross validation score is generally a better estimate of the model's performance on unseen data than the standard 80/10/10 splitting that you'd use when training, validating, and testing your final model.

Many machine learning libraries, such as Python's scikit-learn, have a module for this.

Interpretation of accuracy score on subset of data points

2 Answers

Add your own answers!

Ask a Question