TransWikia.com

What should I do to test the confidence of my deep learning model?

Data Science Asked on January 24, 2021

I’ve recently fine-tuned a deep learning framework/model BERT for a sentiment classification task. I had a 80/10/10 train/validation and test set. After running several experiments, I’ve gotten a decent model that I’d like to eventually productionize. However, I’m looking to create an experiment to test the robustness/reliability/confidence of the model before putting it into production. What are some ways/experiments that can be conducted to test the robustness/reliability/confidence of this model or its predictions?

For example, are there statistically sound principles behind calculating the standard error for binary predictions on a new datapoint?

One Answer

For binary predictions, it is standard to evaluate models based on their ROC and PRC curves. Some metrics are also useful, namely MCC, which is probably the most holistic scalar metric.

Using these metrics, you should evaluate the model via cross-validation. For deep models that take significant time to train, k-fold cross-validation is often sufficient. If you want, you can also do repeated k-fold cross validation, if time permits.

Lastly, while not always possible, many consider the use of different datasets to be one of the best indicators of reliability. Splitting a single dataset still risks "leaking" a common bias in both the training and testing sets. When using two or more datasets, biases that the model learns in the training dataset probably wouldn't manifest in a completely different dataset, hence a more objective evaluation methodology that mimics a production environment. Such biases include, among many things, data acquisition methods, preprocessing/data cleaning methods, etc..

Answered by Benji Albert on January 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP