TransWikia.com

Statistical data vs Precise data

Cross Validated Asked on December 11, 2021

I have a dataset in which I assign descriptive-statistical data of the geographical zone to each person of the dataset (obviously, the person belongs to that specific zone). For example, in a given zone people have a certain level of education, income, interests and other info that I am able to collect at an aggregate level; so, all the people of a specific zone have the same statistical attributes.

I want to use this kind of dataset to train a binary classifier. Is it possible to achieve good performances using such data? Are there specific techniques that treat statistical data instead of precise data of people that belong to different geographical zones?

The problems that I faced using this approach are mainly the following:

  1. All the people of a specific zone have the same descriptive data so I can’t differentiate the phenomenon for different persons in the same zone
  2. I don’t know the precise data of a given person so it’s not the real data of that person
  3. I assume I add noise to my model using descriptive data

I’ve tried to use Logistic Regression but it leads to poor performances, around 0.65 of AUC plotting the ROC curve. The dataset is unbalanced but it wasn’t a big deal for me since the models that I built in the past perform quite well, so I assume that the crucial point is the kind of data I assigned to each person of the dataset. Obviously, I don’t have access to precise data of the people so I can use only geographical/statistical data.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP