TransWikia.com

How to estimate the accuracy on a large dataset?

Data Science Asked by ken wang on December 2, 2020

Given that I have a deep learning model(handover from former colleague).
For some reason, the train/dev set was missing.

In my situation, I want to classify my dataset into 100 categories.
The dataset is extremely imbalanced.
The dataset size is about tens of millions

First of all, I run the model and got the prediction on the whole dataset.

Then, I sample 100 records per category(according to the prediction) and got a 10,000 test set.

Next, I labeled the ground truth of each record for the test set and calculate the precision, recall, f1 for each category and got F1-micro and F1-macro.

How to estimate the accuracy or other metrics on the whole dataset? Is it correct that I use the weighted sum of each category’s precision(the weight is the proportion of prediction on the whole) to estimate?

One Answer

Accuracy has a specific meaning classification - the data points with predicted labels must exactly match actual labels over the total number of data points.

In order to calculate accuracy, you need the actual labels for each data point. If you do not have actual labels for a data point, those data points can not be used in the analysis.

Answered by Brian Spiering on December 2, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP