# Statistical line comparison

Cross Validated Asked by Mobeus Zoom on October 6, 2020

I have a dataset like the one in this question, i.e,

interval    mean  Drug    lower   upper
14  0.004   a   0.002   0.205
30  0.022   a   0.001   0.101
60  0.13    a   0.061   0.23
90  0.22    a   0.14    0.34
180 0.25    a   0.17    0.35
365 0.31    a   0.23    0.41
14  0.84    b   0.59    1.19
30  0.85    b   0.66    1.084
60  0.94    b   0.75    1.17
90  0.83    b   0.68    1.01
180 1.28    b   1.09    1.51
365 1.58    b   1.38    1.82
14  1.90    c   0.9     4.27
30  2.91    c   1.47    6.29
60  2.57    c   1.52    4.55
90  2.05    c   1.31    3.27
180 2.422   c   1.596   3.769
365 2.83    c   1.93    4.26
14  0.29    d   0.04    1.18
30  0.09    d   0.01    0.29
60  0.39    d   0.17    0.82
90  0.39    d   0.20    0.7
180 0.37    d   0.22    0.59
365 0.34    d   0.21    0.53


You can see a good graphical representation in the top answer on the linked thread. Let’s assume the upper = means + 1 standard-deviation and lower = means – 1 standard-deviation. Means and standard-deviations were computed over a set number of trials (say, $$n=20$$) at each interval for each Drug.

My question is, how do I get p-values for the overall superiority of say drug C to drug A or drug B to drug D? What is the correct statistical procedure here and how can it be implemented?

Assuming that you have access to the values from each individual trial, the simplest model here is a two-way drug (a, b, c, or d) × interval (14, 30, 60, 90, 180, or 365) ANOVA.

m = lm(score ~ interval * factor(drug), data=your_data)
anova(m)


This will tell you if a) there's a main effect of drug (indicating that some drugs are better than others), b) there's a main effect of interval (some intervals have higher scores than others), and c) there's a drug × interval interaction (the difference between the drugs varies depending on the interval).

If you do find a main effect of drug, you may want to explore various post hoc test, for instance, testing if there's a significant difference between drugs a and b. The simplest way to do this is just to repeat the analysis on a subset of the data.

data_aVb = dplyr::filter(data, drug %in% c('a', 'b'))
m_aVb = lm(score ~ factor(interval) * drug, data=data_aVb)
anova(m_aVb)


You'll also want to read about correcting for multiple comparisons, but I won't go into that here.

Update!

Since your data is actually a proportion, you'll have to nuance this slightly. Standard ANOVA is a version of linear regression, and assumes the data is on a linear scale. What you actually have is a proportion, indicating that $$y$$ out of 20 patients survived (or similar). You can deal with this by using logistic regression instead of linear regression, as follows (assuming that survived is the total number who survived (out of 20)):

m = glm(cbind(survived, 20) ~ interval * factor(drug),
data=your_data, family=binomial)
anova(m)


Again, there plenty of resources on this online.

Answered by Eoin is on the job market on October 6, 2020

## Related Questions

### Basic RNN sequence classifier diagram?

0  Asked on December 3, 2021 by jbuddy_13

### Including the interaction but not the main effects in a model

18  Asked on December 3, 2021 by glen

### Which statistical test to compare weekly weight changes? (Gain or loss)

2  Asked on December 3, 2021

### Why is the autoencoder decoder usually the reverse architecture as the encoder?

2  Asked on December 3, 2021 by duncster94

### Estimate the period effect using logistic regression

0  Asked on December 3, 2021 by user102546

### Set proper threshold for binary prediction in ElasticNet

1  Asked on December 3, 2021

### $X_{1},X_{2},X_{3}overset{i.i.d.}{sim}N(0,1)$, find m.g.f. of $Y=X_{1}X_{2}+X_{1}X_{3}+X_{2}X_{3}$

2  Asked on December 3, 2021 by sofia-fredriksson

### When you do a random permutation F test (by permuting group membership) is inference made on the samples or the populations?

1  Asked on December 3, 2021

### IV changes the sign of exogenous variable

1  Asked on December 3, 2021 by zhenkai-ran

### Time varying Shapley Decomposition

1  Asked on December 3, 2021 by raghav-goyal

### Estimating the blockchain mining time for $N$ nodes

2  Asked on December 3, 2021 by slowmountain

### GLMM indicates a negative trend, graph shows a positive trend

2  Asked on December 3, 2021

### Question on solution to a typical stochastic process – interview question

0  Asked on December 1, 2021

### Proving a hypothesis test is not a UMP test

1  Asked on December 1, 2021 by harisf

### Cluster analysis considering uncertainty

1  Asked on December 1, 2021 by estela

### Causal estimates have high correlation with naive estimates – what may this imply?

2  Asked on December 1, 2021 by cam-davidson-pilon

### How to inform the space and time complexity of K-means, SOM and Hierachical clustering

1  Asked on December 1, 2021

### In SARSA and Q-learning algorithms in RL, is policy updated during the iteration for Q-value learning?

1  Asked on December 1, 2021 by ruye

### Mathematical explanation for this relationship

0  Asked on December 1, 2021 by azmisov

### Definition of a support vector (SVM)

1  Asked on December 1, 2021 by confucius