TransWikia.com

How to use CLT on statistical inference?

Cross Validated Asked by user777 on February 2, 2021

I have an issue for how a sample of the population can be used to infer about the population parameters.

For example, see the following questions:

The average weekly earnings for female social workers is $$670$. Do men in the same positions have average weekly earnings that are higher than those for women? A random sample of $n = 40$ male social workers showed $bar{x}=$725$ and $s= $102$. Test the appropriate hypothesis using a $alpha = 0.01$

This is an example in the book and in order to solve this problem we first identify the alternative hypothesis which is $H_a : mu > 670$ where $mu$ is the parameter mean of the men population. Thus implies that the null hypothesis is $H_0: mu leq 670$.

Now, the difficulty I’m facing is the following: he compute the z-score of the test statistics $mu = 670$ as following:

$$z = frac{bar{x} – 670}{s/sqrt{n}}$$

Now, this is because CLT. But, wait! CLT states the following:

Given $sigma$ and $mu$ then the sampling distribution of any sample with any probability distribution of the population when n is large (n>30), then the sampling distribution converges on a normal distribution where the sample mean equals the population mean (i.e. $bar{x}_* = mu$ ), and the sample standard deviation equals the population standard deviation divided by root square n (i.e. $s_*=sigma/sqrt n$).

Thus, in the question above, we don’t know about neither the population standard deviation nor population mean.

So what he does is the following**: since n>30, then it is an appropriate to estimate the population standard deviation by the sample standard deviation (i.e. $s = sigma$) and the population mean by the sample mean (i.e. $bar{x} = mu$). Then we have the following:

$$z = frac{bar{x} – 670}{s/sqrt{n}}$$

So, my question is: I don’t know how he said that since n>30, then it is possible to estimate the population standard deviation by the sample standard deviation and the population mean by the sample mean, even though this is not what CLT said. Is there anything that I am missing?

Thanks in advance

One Answer

The sample standard deviation and sample mean are all consistent estimators of population parameters. This means "for large sample sizes", they should be pretty good approximations of the population parrameters.

The whole $n > 30$ distinction is rather arbitrary; it's what some people consider a "large sample size". Of course, this rule of thumb is not appropriate in all situations.

Also, the Central Limit Theorem you have in your question is incorrect. There are a few "central limit theorems" out there. The one I think that you are trying to reference makes an asymptotic statement about the distribution of the sample mean of a particular I.I.D sample. This is opposed to your "CLT" which makes a nonasymptotic statement about the distribution of your sample itself.

Answered by user303375 on February 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP