SVM overfitting with consistent validation results

Data Science Asked by Louis Ryan on January 1, 2021

I have some imbalanced (1400 samples of which 250 are +ve) data for a binary classification problem and I am running an SVM grid search optimising for precision. I am trying 3,4,5,6,7,and 8 stratified and shuffled k-folds and in all cases I am finding the precision to be higher in training than validation (by "all" I mean all the cases of the search that return anything worth using – whilst I don’t care too much about recall I can’t have 5 TPs in my results).

I’m typically getting 90% training precision and 60% test precision and the SD for all k-fold runs never exceeds 5% for the test data.

This seems to go against my intuition of what overfitting is. My next steps will be to ensemble 6/7 undersampled data sets, review my feature space (reduce dimensions/try other features combinations) or try a different model altogether.

Can someone explain what might be going on here and other ways I could remedy?

cross validation overfitting svm

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Jon Church on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?