TransWikia.com

Cross validation for unbalanced dataset using Orange data mining tool

Data Science Asked by Emma Bartholomeeusen on January 12, 2021

I am using the Orange data mining tool to build and analyze models (decision tree, ANN, …) predicting customer churn. As this is an imbalanced class problem (10% churn, 90% not churn), I need to oversample within the cross validation. However, I am not totally able to implement this by myself. Is there anyone with some Orange knowledge that could help me?

Thank you!

One Answer

Orange does not have over/undersampling. Our reasoning is that if you model a problem with 10% positive class, than you should not train the model with 50:50 class distribution - it will not reflect the real life. However, there's an option in Orange in LogReg and Random Forest to balance class distribution, which considers class distribution when building a model.

Answered by vijolica on January 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP