# Can we estimate the mean of an asymmetric distribution in an unbiased and robust manner?

Cross Validated Asked on December 21, 2020

Suppose I have i.i.d. samples $$X_1, cdots, X_n$$ from some unknown distribution $$F$$ and I wish to estimate the mean $$mu=mu(F)$$ of that distribution and I insist that the estimator be unbiased – i.e., $$mathbb{E}[T(X_1, cdots, X_n)] = mu$$.

The canonical estimator is the sample mean $$overline{X} = frac{1}{n} sum_{i=1}^n X_i$$. This is always unbiased and for many families of distributions, such as Gaussians, it is optimal or near-optimal in terms of variance.

However, the sample mean is not robust. In particular, the sample mean can change arbitrarily if a single $$X_i$$ is changed. This means it has a breakdown point of 0.

A more robust estimator is the sample median. Changing a few data points will not, for most samples, significantly change the median. This has a breakdown point of 0.5, which is the highest possible.

For Gaussian data, the sample median has higher variance than the sample mean (by a factor of $$pi/2$$). However, for other distributions, such as the Laplace distribution or Student’s $$t$$-distribution, the median actually has lower variance than the mean.

Furthermore, the median is always unbiased if the distribution is symmetric (about its mean). Many natural distributions are symmetric, but many are not, such as the following examples.

My question is: Are there robust and unbiased estimators for the means of natural asymmetric distributions? By robust I simply mean a non-zero breakdown point and by natural I mean something from the above list or similar (just not a concocted example). I can’t find any examples. I would be particularly interested in the Binomial case.

This is not an unbiased estimate, but it is consistent (you can let the bias approach to zero as the sample size grows).

You can take a trimmed sample (remove the highest and lowest values) and use the mean of the trimmed sample as the estimate.

In the case of a know distribution then you might use an appropriate scaling to make the estimate less biased (or not biased at all), or otherwise the bias will just decrease when you take smaller samples.

Answered by Sextus Empiricus on December 21, 2020

As already said by whuber, one way to answer your question is to de-biase your estimator. If the robust estimator is biased, maybe you can subtract the theoretical bias (according to a parametric model), there are some work that try to do that or to subtract an approximation of the bias (I don't remember a ref but I could search for it if you are interested). For instance, think about the empirical median in an exponential model. We can compute its expectation and then substract this expectation, if you want I can make the computations this is rather simple ... this becomes more difficult if the estimator is more complicated than the median and this works only in parametric models.

A maybe less ambitious question is whether we can construct a consistent robust estimator. This we can do but we have to be careful of what we call robust.

If your definition of robust is having a non-zero asymptotic breakdown point, then already we can prove that this is impossible. Suppose that your estimator is called $$T_n$$ and it converges to $$mathbb{E}[X]$$. $$T_n$$ has a non-zero breakdown point which means that there can be a portion $$varepsilon>0$$ of the data arbitrarily bad and nonetheless $$T_n$$ will not be arbitrarily large. But this can't be because at the limit, if a portion of the data is an outlier, this translates: with probability $$1-varepsilon$$, $$X$$ is sampled from the target distribution $$P$$ and with probability $$varepsilon$$ $$X$$ is arbitrary, but this makes $$mathbb{E}[X]$$ arbitrary also (if you want me to put it formally, I can) which is in contradiction with the non-asymptotic breakdown point of $$T_n$$.

Finally, to conclude on this, we can take the non-asymptotic point of view. Saying that we don't care about the asymptotic breakdown point, what is important is either a the non-asymptotic breakdown point (something like a breakdown point of $$1/sqrt{n}$$. Or to be efficient on heavy-tailed data.

In this case, there are estimators that are robust and consistent estimators of $$mathbb{E}[X]$$. For instance, we can use Huber's estimator with a parameter that goes to infinity or we can use the median-of-means estimator with a number of blocks that tends to infinity. References for this line of thought are "Challenging the empirical mean and empirical variance: A deviation study" by Olivier Catoni or "Sub-Gaussian mean estimators" by Devroye et al (these ref are in the theoretical community, they may be complicated if you are not familiar with empirical processes and concentration inequalities).

Answered by TMat on December 21, 2020

## Related Questions

### Normalize sample to match the mean and the standard deviation

1  Asked on November 22, 2020

### Cross validation and parameter tuning

5  Asked on November 20, 2020 by sana-sudheer

### How does the Dyna Q algorithm works?

1  Asked on November 19, 2020 by nolw38

### Posterior mean of $mu$ in Bayesian Hierarchical model (Poisson-Gamma)

0  Asked on November 17, 2020 by maverick-meerkat

### Contextualising post-hoc tests following repeated one-way ANOVA

1  Asked on November 17, 2020 by dc_liv

### Interpreting SAS output – Roots of AR Characteristic Polynomial

0  Asked on November 14, 2020 by user819749

### ReLU outperforming Softplus

1  Asked on November 12, 2020 by mike-land

### Latest research and explanation on how semi-supervised learning is performing better than supervised?

0  Asked on November 12, 2020 by aaryan-bhagat

### How do I interpret model fit for ordinal regression when AICc and likelihood ratio test conflict?

0  Asked on November 9, 2020 by monica

### Obtaining the complete confidence intervals of binary interacted variables

1  Asked on November 9, 2020 by arun

### What test shall I use to validate the use of a certain score to predict my outcome in a survival analysis?

1  Asked on November 8, 2020

### Individual sampling weights and percentages

1  Asked on November 6, 2020 by seth-c

### he_normal (Keras) is truncated when kaiming_normal_ (pytorch) is not

1  Asked on November 3, 2020 by londumas

### How do I treat my Confounding variables in my multivariate Linear Mixed Model?

1  Asked on October 29, 2020 by thomas-lordick

### Bayesian Likelihood function range

1  Asked on October 29, 2020 by shamm

### Convergence of bootstrap standard error estimate (one of the problems from Efron’s book)

0  Asked on October 26, 2020 by mattjosh

### Zero inflated continuous outcome variables

0  Asked on October 26, 2020 by michaelkyei

### How to test paired observations

1  Asked on October 23, 2020 by doug-fir

### How to project a mxn matrix (m features, n samples) onto a space generated by a mxk matrix (m features, k factors)?

0  Asked on October 20, 2020 by minstein