TransWikia.com

How do zeroes impact regression estimates?

Cross Validated Asked on December 11, 2021

Assume I am estimating a simple cross-sectional regression model. What happens if a large portion of the cross sections consists of zeros only? That is, both the dependent and the independent variables are zero.

Do these cross-sections even have an impact on the estimation at all? Or are they excluded from the estimation.

One Answer

Yes, it will definitely have an impact on your model. Here's a simple simulation to demonstrate. Feel free to fiddle with this code on your own.

It will change slope, intercept, and MSE. It could give you the illusion of a better fit because those (0,0) points will have very low residuals. If the center of your data is not (0, 0), then they'll be high-leverage too.

You need to consider the origin of your zeros. Are they true zeros? Or are they censored values? What to do depends on your question of interest.

library(ggplot2)
library(gridExtra)

# Specify model parameters
n_nonzero = 20
n_zero = 20

beta0 = 5
beta1 = 0.2

mean_x = 5


# Generate data without zeros
x = rnorm(n_nonzero, mean_x, 1)
y = beta0 + beta1 * x + rnorm(n_nonzero, 0, 1)

dat_no_zeros = data.frame(cbind(y, x))
dat_no_zeros = dat_no_zeros[sample(nrow(dat_no_zeros)),]
# Plot
p_no_zeros = ggplot(aes(x=x, y=y), data=dat_no_zeros) + 
  geom_point() + 
  geom_smooth(method='lm') +
  ggtitle('Without zeros')

# Add zeros to above data
x2 = c(x, rep(0, n_zero))
y2 = c(y, rep(0, n_zero))

dat_with_zeros = data.frame(cbind(y2, x2))
dat_with_zeros = dat_with_zeros[sample(nrow(dat_with_zeros)),]
# Plot
p_with_zeros = ggplot(aes(x=x2, y=y2), data=dat_with_zeros) + 
  geom_point() + 
  geom_smooth(method='lm') +
  ggtitle('With zeros')

grid.arrange(p_no_zeros, p_with_zeros, ncol=2)

enter image description here

Answered by Carter on December 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP