I want to conduct hypothesis testing to prove that there are no difference between two group’s mean value.

Null hyp: μ_group1-μ_group2=0

Alt hyp : μ_group1-μ_group2 != 0

my first question is since I know all information about each group’s population such as standard deviation, mean, etc… can I use hypothesis testing on whole population?

Second, does size of population(if 1st question’s answer is "yes")/sample have to be same? so if I have population size of 300 for group1 and 100 for group2 I would need to sample same number from each group and do hypothesis testing?

Cross Validated Asked by Ambleu on January 1, 2021

1 AnswersIllustrating comment, using R:

```
set.seed(2020)
x1 = rnorm(500, 100, 15)
x2 = rnorm(100, 105, 17)
summary(x1); length(x1); sd(x1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
53.32 89.29 98.93 99.18 109.45 148.02
[1] 500 # size sample 1
[1] 15.96929 # sample SD sample 1
summary(x2); length(x2); sd(x2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
59.74 94.62 104.05 104.77 114.88 146.67
[1] 100
[1] 17.11946
```

The two sample means $bar X_1 = 99.18$ and $bar X_2 = 104.77$ differ. The question is whether, in view of the variability of the data, this difference is large enough to be 'statistically significant' at the 5% level.

In the boxplots below, boxes are of different widths, as a reminder that sample sizes are quite different. The fact that the 'notches' in the sides of the boxplots do not overlap, is a preliminary clue that sample means may be significantly different.

```
boxplot(x1, x2, varwidth=T, col="skyblue2", pch=20, notch=T)
```

A Welch t test (used because population variances are unequal), the small P-value $0.003 < 0.05$ indicates significant difference at the 5% level. This is not "proof" that the population means differ. However, we are unlikely to get such different sample means if the population means are the same.

```
t.test(x1, x2)
Welch Two Sample t-test
data: x1 and x2
t = -3.0129, df = 135.64, p-value = 0.003089
alternative hypothesis:
true difference in means is not equal to 0
95 percent confidence interval:
-9.257022 -1.920342
sample estimates:
mean of x mean of y
99.18129 104.76998
```

**Addendum** per comment. Here is a one-sided test. If $bar X_1 > bar X_2,$
then the test of $H_0: mu_1 = mu_2$ against $H_0: mu_1 < mu_2$ will have
a P-value half the size of the two-sided test.

```
t.test(x1, x2, alt="less")
Welch Two Sample t-test
data: x1 and x2
t = -3.0129, df = 135.64, p-value = 0.001544
alternative hypothesis:
true difference in means is less than 0
95 percent confidence interval:
-Inf -2.516599
sample estimates:
mean of x mean of y
99.18129 104.76998
```

Answered by BruceET on January 1, 2021

