Cross Validated Asked on December 21, 2020

Suppose I have i.i.d. samples $X_1, cdots, X_n$ from some unknown distribution $F$ and I wish to estimate the mean $mu=mu(F)$ of that distribution and I insist that the estimator be *unbiased* – i.e., $mathbb{E}[T(X_1, cdots, X_n)] = mu$.

The canonical estimator is the sample mean $overline{X} = frac{1}{n} sum_{i=1}^n X_i$. This is always unbiased and for many families of distributions, such as Gaussians, it is optimal or near-optimal in terms of variance.

However, the sample mean is not *robust*. In particular, the sample mean can change arbitrarily if a single $X_i$ is changed. This means it has a breakdown point of 0.

A more robust estimator is the sample median. Changing a few data points will not, for most samples, significantly change the median. This has a breakdown point of 0.5, which is the highest possible.

For Gaussian data, the sample median has higher variance than the sample mean (by a factor of $pi/2$). However, for other distributions, such as the Laplace distribution or Student’s $t$-distribution, the median actually has lower variance than the mean.

Furthermore, the median is always unbiased if the distribution is symmetric (about its mean). Many natural distributions are symmetric, but many are not, such as the following examples.

My question is: **Are there robust and unbiased estimators for the means of natural asymmetric distributions?** By robust I simply mean a non-zero breakdown point and by natural I mean something from the above list or similar (just not a concocted example). I can’t find any examples. I would be particularly interested in the Binomial case.

This is not an unbiased estimate, but it is consistent (you can let the bias approach to zero as the sample size grows).

You can take a trimmed sample (remove the highest and lowest values) and use the mean of the trimmed sample as the estimate.

In the case of a know distribution then you might use an appropriate scaling to make the estimate less biased (or not biased at all), or otherwise the bias will just decrease when you take smaller samples.

Answered by Sextus Empiricus on December 21, 2020

As already said by whuber, one way to answer your question is to de-biase your estimator. If the robust estimator is biased, maybe you can subtract the theoretical bias (according to a parametric model), there are some work that try to do that or to subtract an approximation of the bias (I don't remember a ref but I could search for it if you are interested). For instance, think about the empirical median in an exponential model. We can compute its expectation and then substract this expectation, if you want I can make the computations this is rather simple ... this becomes more difficult if the estimator is more complicated than the median and this works only in parametric models.

A maybe less ambitious question is whether we can construct a consistent robust estimator. This we can do but we have to be careful of what we call robust.

If your definition of robust is having a non-zero asymptotic breakdown point, then already we can prove that this is impossible. Suppose that your estimator is called $T_n$ and it converges to $mathbb{E}[X]$. $T_n$ has a non-zero breakdown point which means that there can be a portion $varepsilon>0$ of the data arbitrarily bad and nonetheless $T_n$ will not be arbitrarily large. But this can't be because at the limit, if a portion of the data is an outlier, this translates: with probability $1-varepsilon$, $X$ is sampled from the target distribution $P$ and with probability $varepsilon$ $X$ is arbitrary, but this makes $mathbb{E}[X]$ arbitrary also (if you want me to put it formally, I can) which is in contradiction with the non-asymptotic breakdown point of $T_n$.

Finally, to conclude on this, we can take the non-asymptotic point of view. Saying that we don't care about the asymptotic breakdown point, what is important is either a the non-asymptotic breakdown point (something like a breakdown point of $1/sqrt{n}$. Or to be efficient on heavy-tailed data.

In this case, there are estimators that are robust and consistent estimators of $mathbb{E}[X]$. For instance, we can use Huber's estimator with a parameter that goes to infinity or we can use the median-of-means estimator with a number of blocks that tends to infinity. References for this line of thought are "Challenging the empirical mean and empirical variance: A deviation study" by Olivier Catoni or "Sub-Gaussian mean estimators" by Devroye et al (these ref are in the theoretical community, they may be complicated if you are not familiar with empirical processes and concentration inequalities).

Answered by TMat on December 21, 2020

1 Asked on November 2, 2021 by data-man

0 Asked on November 2, 2021 by franziska

1 Asked on November 2, 2021 by s_haring

3 Asked on March 9, 2021 by pythonnoob

0 Asked on March 4, 2021 by bmurray

0 Asked on March 2, 2021 by pluviophile

1 Asked on March 2, 2021 by sleepy

chi squared test contingency tables ecology hypothesis testing statistical significance

0 Asked on March 1, 2021 by sedi

2 Asked on February 28, 2021 by peterbe

0 Asked on February 27, 2021 by user2991421

categorical data categorical encoding continuous data machine learning random forest

1 Asked on February 27, 2021 by mathslover

1 Asked on February 27, 2021 by misologie

1 Asked on February 25, 2021 by mcgurck

0 Asked on February 25, 2021 by la_haine

0 Asked on February 24, 2021 by zge

0 Asked on February 24, 2021 by diricksen

1 Asked on February 24, 2021 by zvisofer

Get help from others!

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

Recent Answers

- Lex on Does Google Analytics track 404 page responses as valid page views?
- Joshua Engel on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP