# How to test paired observations

Cross Validated Asked by Doug Fir on October 23, 2020

I have a set of data that look like this:

   Spray.A Spray.B
1       10      11
2        7      17
3       20      21
4       14      11
5       14      16
6       12      14
7       10      17
8       23      17
9       17      19
10      20      21
11      14       7
12      13      13


If Spray A is the original, we want to know if Spray B, the new one, is “better”. A higher average number indicates better.

sapply(data, mean)
Spray.A  Spray.B
14.50000 15.33333


So B appears better at first glance. But, if I wanted to apply a hypothesis test where Ho is that there is no difference with a threshold of 0.05, how would I do that?

Each observation took place in a different city. Does that impact the choice of test? A paired t-test perhaps?

I have done a chi-squared test before, where I’d input the means only. But what would be the right hypothesis test to use here to determine if the higher mean from Spray B is sufficiently different enough to reject the hypothesis?

Yes, the fact that measurements are paired, in the sense that there are two measures for each city over a set of cities, means that your data are not independent. The lack of independence violates the assumption of the independent samples $t$-test. A paired samples $t$-test is an option here.

However, you don't have much data, and the paired samples $t$-test assumes that the differences are normally distributed. Your differences don't look very normal in a qq-plot:

library(car)
qqPlot(Spray.B-Spray.A)


Thus, you may prefer an nonparametric option instead. The nonparametric analog of the paired $t$-test is the Wilcoxon signed rank test.

Running these tests in R is straightforward:

t.test(Spray.B, Spray.A, alternative="greater", paired=TRUE)
#
#         Paired t-test
#
# data:  Spray.B and Spray.A
# t = 0.6059, df = 11, p-value = 0.2784
# alternative hypothesis: true difference in means is greater than 0
# 95 percent confidence interval:
#  -1.636524       Inf
# sample estimates:
# mean of the differences
#               0.8333333
#
wilcox.test(Spray.B, Spray.A, alternative="greater", paired=TRUE)
#
#         Wilcoxon signed rank test with continuity correction
#
# data:  Spray.B and Spray.A
# V = 41.5, p-value = 0.2375
# alternative hypothesis: true location shift is greater than 0
#
# Warning messages:
# 1: In wilcox.test.default(Spray.B, Spray.A, alternative = "greater",  :
#   cannot compute exact p-value with ties
# 2: In wilcox.test.default(Spray.B, Spray.A, alternative = "greater",  :
#   cannot compute exact p-value with zeroes


The Warning messages are nothing to worry about. As explained in the documentation, these are stating that the exact $p$-value could not be computed and so the reported $p$-value is based on the normal approximation.

Correct answer by gung - Reinstate Monica on October 23, 2020

## Related Questions

### Fixed effects versus random effects in panel data for intervention group only? Change in dependent variable for each time period

0  Asked on December 11, 2021 by isobel-m

### How to calculate ARIMA(1,0,0)(1,0,1)12 prediction by hand

1  Asked on December 10, 2021 by code_diy

### Compare RMSE for the same model but varying sample size

3  Asked on December 8, 2021 by skoestlmeier

### ANOVA determining percentage of variation

2  Asked on December 8, 2021 by unistudent87

### Oscillating validation accuracy for a convolutional neural network?

3  Asked on December 8, 2021 by rockthestar

### Beta values for mixed models

1  Asked on December 8, 2021

### Comparing top level group effects using a 3-level hierarchical regression

1  Asked on December 8, 2021 by kev8484

### What are the worst (commonly adopted) ideas/principles in statistics?

32  Asked on December 8, 2021

### Statistical Analysis over different samples – Prediction for the number of objects

0  Asked on December 8, 2021

### Group level distribution for positive parameters in Bayesian multilevel models

0  Asked on December 8, 2021 by likao

### Learning more about glm parameters, how to dig deeper?

0  Asked on December 8, 2021

### OLS regression interpretation when sample means from t-test are insignificant

0  Asked on December 8, 2021 by thetagang

### Can we compare the effects of continuous covariate and categorical covariate on response variable in generalized linear regression?

0  Asked on December 8, 2021

### what is Multivariate Data

2  Asked on December 8, 2021

### Why LIME does not show the attribution for each features

0  Asked on December 8, 2021

### Is it useful or even necessary to standardize independent variables for linear regression?

1  Asked on December 8, 2021

### Why and under what conditions does Q learning converge?

0  Asked on December 8, 2021

### How to factor this conditional probability?

1  Asked on December 8, 2021 by user292136

### What are some good resources to learn Statistical Genetics?

1  Asked on December 6, 2021 by hulk