# Output the linear predictor from a stratified cox model?

Bioinformatics Asked by Nienke on December 17, 2020

I have fitted a cox model on a pooled dataset of multiple studies, say study A, B and C. As these studies have different baseline hazard functions, I stratified for ‘study’ in the model.

Like this:

df\$SurvObj <- with(df, Surv(event_rd, event == 1))

fit <- coxph(SurvObj ~ cov1 + cov2 + cov3 + cov4 + strata(study),data=df)


Now, I want to assess whether I can use the beta coefficients from the above model to predict the event probability in a new study, study D.
The baseline hazard in study D is different from those in study A, B and C.

When I have

predict(fit,type="lp",newdata=studyD,reference="strata")


I get this error:

Error in model.frame.default(data = studyD, formula = ~hba1c + sbp +
: factor strata(study) has new levels study=4

Why does R require that I match the strata of study D with those in study A,B and/or C when I only want to output the linear predictor (type="lp")? As the linear predictor is independent from the baseline hazard function in this case.

I suppose it should be possible to extract the linear predictor for individuals in study D and then manually calculate the event probability using the baseline hazard of study D.

Does anybody know how to do this?
I could not find any question like this so maybe I am thinking the wrong way.

Unlike wikipedia, the linear predictor (LP) in coxph is made dependent from the baseline hazard function $$lambda$$. For example, if a model has two predictors x1, x2 stratified by sex, the linear predictor would be like: $$LP = log ( lambda _{sex}) + beta _{1}x _{1}+beta _{2}x _{2}$$

Where the $$log(lambda _{sex})$$ can be obtained by predict() with x1=x2=0.

We could calculate the LP without strata information (sex) by hand using the $$beta _{1} , beta _{2}$$ provided by coef(fit) and a customised $$lambda _{unkown , sex}$$.

Below is a simulation comparing the results from predict() and by hand coef(fit).

## Toy data
df <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x1=c(0,2,1,1,1,0,0),
x2=1:7,
sex=c(0,0,0,0,1,1,1))

## Fit a stratified model
fit <- coxph(Surv(time, status) ~ x1 + x2 + strata(sex), df)
coef(fit)
#         x1          x2
# 0.79451881 -0.01917633

## Baseline linear predictor
lambda_sex1 <- predict(fit, newdata=list(x1=0, x2=0, sex=0))
# -0.746578
lambda_sex2 <- predict(fit, newdata=list(x1=0, x2=0, sex=1))
# -0.1497816

## by predict function
predict(fit, newdata=list(x1=1, x2=2, sex=0))
# 0.009588167
predict(fit, newdata=list(x1=1, x2=2, sex=1))
# 0.6063845

## by hand
lambda_sex1 + coef(fit)[1]*1 + coef(fit)[2]*2
# 0.009588167
lambda_sex2 + coef(fit)[1]*1 + coef(fit)[2]*2
# 0.6063845


Finally, to calculate "a" linear predictor for an unknown sex:

# prevented by predict function
predict(fit, newdata=list(x1=1, x2=2, sex=2))
#Error in model.frame.default(data = list(x1 = 1, x2 = 2, sex = 2), formula = ~x1 +  :
# factor strata(sex) has new level sex=2

# by hand, assuming lambda for sex 2 is 0
0 + coef(fit)[1]*1 + coef(fit)[2]*2
# 0.7561661


Answered by Ryan SY Kwan on December 17, 2020

## Related Questions

### Generating 3D coordinates error

1  Asked on January 15, 2021 by shahbaaz

### Save output to a specific folder and/or with a specific prefix in Cancer Genomics Cloud

1  Asked on January 15, 2021

### Populations genetics and dynamics of bacteria on a Graph

1  Asked on January 13, 2021

### Convert a fasta file to Newick format

3  Asked on January 11, 2021

### BAM file filteing to remain best isoform

0  Asked on January 10, 2021 by user977828

### How is the GT field in a VCF file defined?

2  Asked on January 7, 2021

### Somatic mutations for normal WES samples

0  Asked on January 6, 2021 by lot_to_learn

### Get list of urls of GSM data set of a GSE set

1  Asked on January 6, 2021 by user432797

### Why is there a chloride ion in this 3D model?

1  Asked on January 4, 2021 by manuel-milla

### Biohackers Netflix – DNA to binary and video

1  Asked on January 3, 2021 by xamax

### if I know the number of sequencing circles can I give this information to DESeq2?

1  Asked on January 2, 2021 by marilu

### DNA sequence error annotation

0  Asked on December 30, 2020 by matthew-jones

### Program to make a haplotype network for a specific gene

1  Asked on December 30, 2020 by ryan-fahy

### Cobratoolbox unable to identify gurobi solver when passing initCobraToolbox

0  Asked on December 29, 2020

### How to interpret Mendelian randomization results?

1  Asked on December 29, 2020 by anamaria

### Why ORF13 and ORF14 of Bat Sars coronavirus Rp3 have no corrispondence in Sars2?

1  Asked on December 26, 2020

### samtools / bamUtil | Meaning of as Reference Name

1  Asked on December 25, 2020 by paul-endymion

### Help Adjusting Bedtools Output in R

1  Asked on December 23, 2020

### How to remove batch effect from TCGA and GTEx data

2  Asked on December 22, 2020 by kai-he