Interpreting classification report scores

Question

I have been working with multi-class classification where the labels have four classes in total. I have used Random forest classifier and also performed cross validation and was able to obtain accuracies around 97% along with the below classification report. I want to know whether my model is fine to go with this classification report. The whole classification report has scores more than 93% (image is in the link). I’ve read many articles so far and as far as I understood my model seems to be okay. But could you please help in interpreting these scores? I’m very grateful if I can be guided to know whether my model is over fitting or not.

classification report image

cross validation machine learning python random forest

Chris · Answer

You have a bit of class imbalance going on, with your largest class being about 50% larger than the others.  You may also notice that your model gets that class right almost always, and there are so many of that class it could be inflating your scores.
I would sample that class down to about the size of the others and rerun.  If the scores still come out good you might be all set.  Validating your model with some previously unseen data after fitting should tell you what you need to know.
Edit:
Your model may not be overfitting, but it is biased towards class 4.  It's the prediction that is most often the correct one, and as a result is the source of most of your bad classifications (as seen in the confusion matrix).  If predicting class 4 when it should be class 2 means someone wont't get a life-saving medical treatment, or it costs your company a million dollars every time, then the model is no good.  If however having a small percentage of your predictions being wrong in this way is acceptable you should be good.
It is entirely possible your classes are distinct enough that it becomes trivial for the model to classify correctly, in which case very low error is possible.  It wouldn't hurt to hold out some percentage of your data as validation data for post-modeling scoring.  This would tell you the true story.  If validation is good then you can deploy the model.  If your prediction quality start slipping in the future it's likely due to data drift, where the distribution of values in your features don't match the ones used in training at which point you need to retrain the model.

Interpreting classification report scores

One Answer

Add your own answers!

Ask a Question