Box-Cox data transformation to enable linear regression

Cross Validated Asked by CBGodbole on January 1, 2022

I am performing multiple linear regression to predict a score (dependent variable) from multiple categorical variables. My dependent variable has skewed distribution with a large number of zero values but no negative values.
Can I use Box-Cox transformation in this scenario?

I tried to run it in R, but got the error message –
"Error in boxcox.default(linreg1) : response variable must be positive"

Box-Cox transformation works fine with zeros. Hope you are using boxcox.fit() in package named geoR.

However, you can solve your problem of skewness with other transformations like:

1. Square root transformation. However, often the square root is not a strong enough transformation to deal with the high levels of skewness.
2. Use log(x+1) transformation which is a widely accepted way of feature transformation.

Also, I don't understand why you are doing transformation of the dependent variable. I agree with @dave for the assumption of normality in regression.

Answered by Vivek on January 1, 2022

Related Questions

Capacity of neural network with one hidden neuron?

1  Asked on November 24, 2021

What does Y-axis of Normal Distribution’s plot denote?

2  Asked on November 24, 2021 by dmittal

Confidence intervals and multiple regression for a multiply imputed data set

2  Asked on November 24, 2021 by appleseed

Universal approximation theorem on limited precision arithmetic

0  Asked on November 24, 2021 by mrmartin

Intercept interpretation in multi-level model when first-level predictor discrete

1  Asked on November 24, 2021

Factors given by DoE can experimentally not be reached

0  Asked on November 24, 2021

Why should we compare estimates of generalized linear model with its corresponding standard errors?

3  Asked on November 24, 2021

Is it possible to view sequential independent trials as pre-determined with unknown outcome?

1  Asked on November 24, 2021 by jack-arthur

Interpret coefficient of negative binomial regression

1  Asked on November 24, 2021

Gaussian process smoothers (bs = “gp”) in GAMs

0  Asked on November 24, 2021 by doug-sponsler

simulation of logistic regression sensitivity to prior probability: Brier score vs accuracy

0  Asked on November 24, 2021

What is the best structure (Accuracy of the text extracted) for building an OCR? ATTENTION, CRNNN, DRAM,RAM, CTC based

0  Asked on November 24, 2021

Nearest-neighbor returns different results based on coordinates chosen

0  Asked on November 24, 2021 by zhutchens1

Counterexample where E(u|x)=0 in a regression model cannot hold in the population?

1  Asked on November 24, 2021

report output GLMER and do contrasts

0  Asked on November 24, 2021 by chiara-toschi

How determine the bandwidth of a gaussian kernel such that k nearest points represent a certain % of sum weight

1  Asked on November 24, 2021 by tzirtzi

Comparing ISOMAP residual variance to PCA explained variance

1  Asked on November 21, 2021 by user3358740

How to optimize Gaussian-process parameters for multiple tasks with GPML?

1  Asked on November 21, 2021 by scott-thibault

For conjoint attribute importance calculation, should insignificant attribute levels be included in the calculation?

1  Asked on November 21, 2021 by arctan27

Multilevel Poisson Regression

1  Asked on November 21, 2021