Cross Validated Asked by CBGodbole on January 1, 2022

I am performing multiple linear regression to predict a score (dependent variable) from multiple categorical variables. My dependent variable has skewed distribution with a large number of zero values but no negative values.

Can I use Box-Cox transformation in this scenario?

I tried to run it in R, but got the error message –

"Error in boxcox.default(linreg1) : response variable must be positive"

Box-Cox transformation works fine with zeros. Hope you are using boxcox.fit() in package named geoR.

However, you can solve your problem of skewness with other transformations like:

- Square root transformation. However, often the square root is not a strong enough transformation to deal with the high levels of skewness.
- Use log(x+1) transformation which is a widely accepted way of feature transformation.

Also, I don't understand why you are doing transformation of the dependent variable. I agree with @dave for the assumption of normality in regression.

Answered by Vivek on January 1, 2022

