TransWikia.com

HR employee attrition modeling - making a balanced sample question

Data Science Asked by Nimrod Ets on December 16, 2020

I have dataset 1 (stayers) consisting of 1500 record of HR data demographic data of employees (11 features) who currently are in the company. Dataset 2 (leavers) contains 180 records -same features- contains demographic data of people who voluntarily left the company.

My aim is to identify within dataset 1 who is at risk of leaving the company.

Question: what would be a good approach to build a training data set?
I am thinking about some kind of train_test_split

My thinking is right now to split stayers (dataset 1) into 8 groups of roughly 180 records large groups
then combining each of these groups individually with the complete dataset 2 (leavers) build a logistic regression

With each of these combinations I do a logistic regression on the remaining stayers data and do a prediction on attrition (yes/no) and then compare all the resulting models

What do you think? Any glaring pitfalls or risks in my approach?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP