TransWikia.com

Measure correlation for categorical vs continous variable

Data Science Asked on August 28, 2021

Given a variable which is categorical that depends on continuous variables, I would like to know how to check wether these continous variable explain the categorical one.

So:

Y = cagetorical 
X1 = continous 
X2 = continous
X3 = continous

I’d start with a correlation but which? I’ve seen How to get correlation between two categorical variable and a categorical variable and continuous variable? but there it is explained wether there is a difference in categorical variables explaining a continous variable, so I think it’s another topic?

I’m fine with tool advices in R and python as well.

edit: I’m not sure wether cateogrical is correct here. The values of $ Y $ are $ 0, 1, 2, 3 $ but I could also use $ A, B, C, D $. They represent a classification of the measure of cleanliness of a room.

2 Answers

By saying you want to "explain Y by X" it sounds that you try to build a classifier F that can map X values into expected Y: F(X) --> Y. If so, you don't have to search for "correlation" necessarily. There are various methods to build such a classifier. You can use logistic regression SVM Neural network etc.

Besides, if it make more sense for you, you can always first discretize the continuous variables into categorical vars and than use also other methods such as decision trees Naive Bayes and more.

Correct answer by Oren Razon on August 28, 2021

So you want to explain the influence of 1-n ordinal variables X on one interval/continuous variable Y. What is the best way to do it?

Correlation

Spearman rank-order correlation is the right approach for correlations involving ordinal variables even if one of the variables is continuous. Some sources do however recommend that you could try to code the continuous variable into an ordinal itself (via binning --> e.g. a 0-100 variable coded as 0-25,26-50,51-75,76-100) and include that into the correlation which is a valid approach as well.

Regression

In most regression models we can treat ordinal variables as continuous and probably be okay. Regression models have several key advantages over correlations for your question. They can deal with multiple predictors and also identify the magnitude of influence.

What you always have to do

To deal with ordinal variables in a correlation or a regression you always have to label encode them which means A,B,C,D becomes 0,1,2,3.

Answered by Fnguyen on August 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP