TransWikia.com

How can I measure the reliability of the specificity of a model with very small train, test, and validation datasets?

Data Science Asked on September 25, 2021

Stats newbie here. I have a small dataset of 646 samples that I’ve trained a reasonably performant model on (~99% test and val accuracy). To complicate things a little bit, the classes are somewhat unbalanced. It’s a binary classification problem.

Here is my confusion matrix on training data

[[387   1]
 [  1  73]]

on testing data:

[[74  1]
 [ 0 10]]

on validation data:

[[85  1]
 [ 0 13]]
  1. Training Specificity: .986
  2. Testing Specificity: .909
  3. Validation Specificity: .928

My thoughts are that testing and validation have a very low specificity while training has a comparatively high specificity. However, given that only one sample is missed in both the testing and validation datasets, what is my real-world specificity? Is there a better generalizability measure? Is there something akin to a p-value that relates the reliability of the specificity given the size of the negative sample class?

Thanks!

One Answer

Real world data is "test dataset", right? Data has to be divided in such a way that train-validation see part of data more than once while test data will be seen only once. In that sense, if the model is robust enough, it will perform well even on the test dataset. The assumption is that test data is as close as possible to real-world data.

Answered by Chaitanya Bapat on September 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP