How to build a unbiased predictive ML model when the record of the event is less compared to the total number of records?

Question

I am trying to build a model that will predict the communication loss of a wireless device. For now I am using RandomForestClassifier along with Device and Location as the features. I am getting both the train score and test score as 99%. So I am pretty sure the model is giving biased result. One of the reason might be because the record of communication loss events are very less compared to the the record with no communication loss Some people advised me that it might not be possible to build a prediction model based on the situation. But I would like to have more suggestion or advice if there is anything I can do about it.

lcrmorin · Answer

1) It seems that your data are unbalanced, you should look into that. Common techniques include oversampling the minority class, but you might have a bigger problem here.

2) It is unclear that you have enough information for what you are trying to achieve (type of device and location doesn't seems to be enough).

3) Based on the two preceding remarks, you have to acknoledge it is unlikely you will be able to get the date of an event. Would a human be able to guess that is a good question to ask yourself if you try to apply ML on a problem. If the answer is no, then a ML solution is unlikely to work. You can do statistical analysises to try to know if there are devices types or locations that are more susceptible of loss, but you won't be able to get exact time and location of a loss.

How to build a unbiased predictive ML model when the record of the event is less compared to the total number of records?

One Answer

Add your own answers!

Ask a Question