# Statistical line comparison

Cross Validated Asked by Mobeus Zoom on October 6, 2020

I have a dataset like the one in this question, i.e,

interval    mean  Drug    lower   upper
14  0.004   a   0.002   0.205
30  0.022   a   0.001   0.101
60  0.13    a   0.061   0.23
90  0.22    a   0.14    0.34
180 0.25    a   0.17    0.35
365 0.31    a   0.23    0.41
14  0.84    b   0.59    1.19
30  0.85    b   0.66    1.084
60  0.94    b   0.75    1.17
90  0.83    b   0.68    1.01
180 1.28    b   1.09    1.51
365 1.58    b   1.38    1.82
14  1.90    c   0.9     4.27
30  2.91    c   1.47    6.29
60  2.57    c   1.52    4.55
90  2.05    c   1.31    3.27
180 2.422   c   1.596   3.769
365 2.83    c   1.93    4.26
14  0.29    d   0.04    1.18
30  0.09    d   0.01    0.29
60  0.39    d   0.17    0.82
90  0.39    d   0.20    0.7
180 0.37    d   0.22    0.59
365 0.34    d   0.21    0.53


You can see a good graphical representation in the top answer on the linked thread. Let’s assume the upper = means + 1 standard-deviation and lower = means – 1 standard-deviation. Means and standard-deviations were computed over a set number of trials (say, $$n=20$$) at each interval for each Drug.

My question is, how do I get p-values for the overall superiority of say drug C to drug A or drug B to drug D? What is the correct statistical procedure here and how can it be implemented?

## One Answer

Assuming that you have access to the values from each individual trial, the simplest model here is a two-way drug (a, b, c, or d) × interval (14, 30, 60, 90, 180, or 365) ANOVA.

m = lm(score ~ interval * factor(drug), data=your_data)
anova(m)


This will tell you if a) there's a main effect of drug (indicating that some drugs are better than others), b) there's a main effect of interval (some intervals have higher scores than others), and c) there's a drug × interval interaction (the difference between the drugs varies depending on the interval).

If you do find a main effect of drug, you may want to explore various post hoc test, for instance, testing if there's a significant difference between drugs a and b. The simplest way to do this is just to repeat the analysis on a subset of the data.

data_aVb = dplyr::filter(data, drug %in% c('a', 'b'))
m_aVb = lm(score ~ factor(interval) * drug, data=data_aVb)
anova(m_aVb)


You'll also want to read about correcting for multiple comparisons, but I won't go into that here.

Update!

Since your data is actually a proportion, you'll have to nuance this slightly. Standard ANOVA is a version of linear regression, and assumes the data is on a linear scale. What you actually have is a proportion, indicating that $$y$$ out of 20 patients survived (or similar). You can deal with this by using logistic regression instead of linear regression, as follows (assuming that survived is the total number who survived (out of 20)):

m = glm(cbind(survived, 20) ~ interval * factor(drug),
data=your_data, family=binomial)
anova(m)


Again, there plenty of resources on this online.

Answered by Eoin is on the job market on October 6, 2020

## Related Questions

### How to interpret “quantile residuals”

1  Asked on December 20, 2021

### Is the regressor (sometimes called “independent” variable) actually independent of the response from a probabilistic perspective?

1  Asked on December 20, 2021 by 24n8

### Laplace approximation in high-dimensions

2  Asked on December 20, 2021 by dionysis-m

### Extremely basic question: how are data assumed to be generated in machine learning?

0  Asked on December 20, 2021 by frass

### Is it appropriated to use an ‘Invariant’ variable in multivariate test?

0  Asked on December 20, 2021 by terauser

### Equation for weighted average with normalization(?)

0  Asked on December 20, 2021

### guessing a number between 1 and 100

1  Asked on December 20, 2021 by dynamic89

### Normalizing posterior distribution

1  Asked on December 20, 2021

### chi squared test in python libraries

1  Asked on December 20, 2021 by eurohacker

### Structural break test for non-stationary time series

0  Asked on December 20, 2021

### How to test for multicollinearity among dummy explanatory variables?

1  Asked on December 18, 2021 by kellyyang

### A routine to choose eps and minPts for DBSCAN

3  Asked on December 18, 2021 by mehraban

### Mixture of Gaussians on Log of Data

2  Asked on December 18, 2021 by zhubarb

### Profile likelihood

1  Asked on December 18, 2021 by denby47

### Johansen cointegration testing: rejecting at 10% vs. 1% level

2  Asked on December 18, 2021

### Retrieving time series from a smoothed periodogram

1  Asked on December 18, 2021 by bayesisbaye

### Which likelihood function is used in linear regression?

3  Asked on December 18, 2021 by floyd

### Distributions of Quadratic form of a normal random variable

1  Asked on December 18, 2021 by xorion-1997

### Euclidean distance from zero

0  Asked on December 18, 2021

### Inconsistent results from partial Mantel test on (non)distance matrices

0  Asked on December 18, 2021 by ian-lane

### Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP