TransWikia.com

How to determine data size is statistically efficient?

Cross Validated Asked by 1111ktq on December 27, 2021

I have a question about the data size for probability of default model.
For each consumer, I have a binary bit to indicate whether the client goes default or not (1 is default and 0 is current). And I have the occupation for each consumer (ie, doctor, student). So I can calculate the probability of default for different occupations (#default clients / #total clients), and then continue with my analysis.

The problem here is that for each occupation, it has different size (ie. 1000 clients are students, only 50 clients are doctors). Then how can I say that the population of the group(data points) is statistically sufficient to calculate the probability of default for that occupation?

For example, there is only 1 out of 50 doctors in my database that went default, so does this 2% is correctly reflecting the default behavior of this occupation? If the population is too small, then I don’t want to include it in my future analysis. What should be the minimum data size for each group that I can say I’m confident with the outcome?

Much Appreciated!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP