Statistical line comparison

Question

I have a dataset like the one in this question, i.e,
interval    mean  Drug    lower   upper
14  0.004   a   0.002   0.205
30  0.022   a   0.001   0.101
60  0.13    a   0.061   0.23
90  0.22    a   0.14    0.34
180 0.25    a   0.17    0.35
365 0.31    a   0.23    0.41
14  0.84    b   0.59    1.19
30  0.85    b   0.66    1.084
60  0.94    b   0.75    1.17
90  0.83    b   0.68    1.01
180 1.28    b   1.09    1.51
365 1.58    b   1.38    1.82
14  1.90    c   0.9     4.27
30  2.91    c   1.47    6.29
60  2.57    c   1.52    4.55
90  2.05    c   1.31    3.27
180 2.422   c   1.596   3.769
365 2.83    c   1.93    4.26
14  0.29    d   0.04    1.18
30  0.09    d   0.01    0.29
60  0.39    d   0.17    0.82
90  0.39    d   0.20    0.7
180 0.37    d   0.22    0.59
365 0.34    d   0.21    0.53

You can see a good graphical representation in the top answer on the linked thread. Let's assume the upper = means + 1 standard-deviation and lower = means - 1 standard-deviation. Means and standard-deviations were computed over a set number of trials (say, $n=20$) at each interval for each Drug.
My question is, how do I get p-values for the overall superiority of say drug C to drug A or drug B to drug D? What is the correct statistical procedure here and how can it be implemented?

Eoin is on the job market · Answer

Assuming that you have access to the values from each individual trial, the simplest model here is a two-way drug (a, b, c, or d) × interval (14, 30, 60, 90, 180, or 365) ANOVA.
m = lm(score ~ interval * factor(drug), data=your_data)
anova(m)

This will tell you if a) there's a main effect of drug (indicating that some drugs are better than others), b) there's a main effect of interval (some intervals have higher scores than others), and c) there's a drug × interval interaction (the difference between the drugs varies depending on the interval).
If you do find a main effect of drug, you may want
to explore various post hoc test,
for instance, testing if there's a significant difference between drugs a and b. The simplest way to do this is just to repeat the analysis on a subset of the data.
data_aVb = dplyr::filter(data, drug %in% c('a', 'b'))
m_aVb = lm(score ~ factor(interval) * drug, data=data_aVb)
anova(m_aVb)

You'll also want to read about correcting for multiple comparisons, but I won't go into that here.

Update!

Since your data is actually a proportion,
you'll have to nuance this slightly.
Standard ANOVA is a version of linear regression,
and assumes the data is on a linear scale.
What you actually have is a proportion,
indicating that $y$ out of 20 patients survived (or similar). You can deal with this by using logistic regression instead of linear regression, as follows (assuming that survived is the total number who survived (out of 20)):
m = glm(cbind(survived, 20) ~ interval * factor(drug), 
        data=your_data, family=binomial)
anova(m)

Again, there plenty of resources on this online.

Statistical line comparison

One Answer

Add your own answers!

Ask a Question