TransWikia.com

Can i expect good results having low correlation attributes?

Data Science Asked on October 5, 2020

This was a question i saw in an interview for a data scientist position:

"Here is the following correlation heatmap that i got from my attributes. Regarding the correlation of each feature with the dependent variable (target/class), it is noticeable that correlations are not very expressive.

enter image description here

Yet, i would like to know if can i expect good results from a classification model using this dataset. Also, what further investigation can i do (if i shouldn’t look after correlation only)?"

2 Answers

It's a general question, so there are more then a few things you can do.
Although, what stopping you to train a basic clssifier and investigate the results?

Some ideas:

  • Use Predictive Power Score to keep on investigate your data
  • Check for non-linear correlation between the features
  • Investigation the features importance
  • Use dimension reduction
  • Check for imbalances

Correct answer by Sahar Milis on October 5, 2020

The correlation does not effect your model using decision trees in a classification problem.

In the theory of decision tree models, you don`t need correlation or check of multicollinearity. Because the split in decision trees is made of entropy/information gain. The correlation does only check linear dependencies. The same is, when the dataset is highly correlated. You will get very good results with decision trees, because there you don´t need to delete correlated features or do dimension reduction (if you don´t have to).

It can be, that you don´t get very good results, when you use linear structured models like multiclass neural network, or multiclass logistic regression. There you will see that dimension reduction etc. can have a high influence on the accuracy in these models.

I had a similar question but with highly correlated features: decision -tree regression to avoid multicollinearity for regression model?

In your case I would say, if we use decision trees, it is not noticeable. However we should check this with the permutation importance of the features and check the polynomial dependencies. Of course you should ask the interviewer more question about his questions and the target of his question, to get more background information. This is very important in interviews.

Answered by martin on October 5, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP