TransWikia.com

How to model variance of a heteroskedastic dataset

Cross Validated Asked by Yilei Huang on November 6, 2021

I have a dataset (scatter plot shown below) where x-axis corresponds to observed value and y-axis being the true value (yeah, sort of flipped from the convention where y is the observed value). I am not interested in estimating y per se, but instead, I want to estimate variance of y given x. It might be hard to see from the scatter plot, but the variance of y for bigger x is much larger than x; however, I think that’s simply because the range of x is quiet large. Therefore, even though the variance of y for smaller x is smaller in absolute terms, in relative terms, the variance is much larger for smaller x. I can imagine that one can bin samples by x, and then estimate sample variance of y at each binned level. However, that seems quite ad hoc. I wonder if there is a more principled way to estimate variance of y given x. Also, for practical reasons, sample at certain ranges of x (as you can see from the plot, roughly from x=1200 to x=1500) is sparse. I wonder if this could be taken into account, too.

enter image description here

One Answer

You can do this using weighted least squares. The variance function is inversely proportional to the weight function, which you have to specify. You can try different weight (variance) functions, and compare their fit using likelihood-based statistics. You can also do this with using the "gamlss" procedure in R, which assumes the log variance is a linear function of the X variables.

Answered by BigBendRegion on November 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP