TransWikia.com

Is there a way to use cor function with factor variables without creating dummy variables? (R)

Cross Validated Asked by Charles Orlando on November 18, 2021

I have a dataset with several categorical predictors with varying factor levels. Is there a way to generate a correlation matrix from this data without having to create a bunch of dummy variables?

I’m using multiple linear regression to predict a continuous variable (sales). The predicted values are surprisingly accurate and plotting the predicted vs observed results in a near diagonal line.

I thought that was all I needed to worry about, but in researching, I found I should also plot predicted vs residuals to test for homoscedasticity. I did that and found out I was violating it.

I was looking for a way to resolve this and found a post that said I should use a robust method for computing the covariance matrix. Hence why I want to use the cor() function, though I’m not sure if that’s actually the right way of going about this.

And here are the actual graphs:

Predicted vs Actual…

Predicted vs Residual…

One Answer

You're going to want to use the lmtest package for re-estimating the model along with the sandwich package for the robust covariance matrix

fit <- lm(sales ~ race + age + ...)

install.packages(sandwich)
install.packages(lmtest)
library(sandwich)
library(lmtest)

coeftest(fit, vcov = vcovHC(fit, type="HC"))

Type "HC" is the original White's estimator, the default in vcovHC is "HC3" and the reason for this is given in the documentation ?vcovHC

Answered by Fabian August on November 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP