TransWikia.com

How to get the survival duration prediction for each individual in the data by using the Kaplan-Meier method?

Data Science Asked by Kristada673 on November 10, 2020

I am trying to learn how to use the Kaplan-Meier survival estimator model in the lifelines package. The documentation says that the KaplanMeierFitter.fit function returns “a modified self, with new properties like 'survival_function_'.” I checked what the survival_function_‘s contents are – it seems to contain the average survival probability for all the players in the dataset at each time time interval. For example, in my dataset, there are 66 months and about 250,000 players (i.e., individuals whose death event we are trying to predict – 75% of them have had their deaths and the rest 25% is censored data, i.e., their deaths have not been observed), so the survival_function_ contains the following:

>>> kmf.survival_function_

        KM_estimate
timeline            
-1.0    1.000000
0.0     0.995473
1.0     0.779609
2.0     0.621312
3.0     0.508698
4.0     0.424205
5.0     0.366714
6.0     0.324090
7.0     0.289432
8.0     0.259339
9.0     0.234256
10.0    0.212542
11.0    0.192735
12.0    0.172880
13.0    0.157821
14.0    0.144604
15.0    0.132614
16.0    0.121743
17.0    0.112202
18.0    0.103710
19.0    0.095829
20.0    0.088811
21.0    0.082302
22.0    0.076773
23.0    0.071249
24.0    0.065752
25.0    0.060534
26.0    0.056082
27.0    0.051978
28.0    0.048073
...     ...
37.0    0.023696
38.0    0.020562
39.0    0.017846
40.0    0.015783
41.0    0.013817
42.0    0.012253
43.0    0.010645
44.0    0.009354
45.0    0.008186
46.0    0.007195
47.0    0.006274
48.0    0.005486
49.0    0.004656
50.0    0.003948
51.0    0.003391
52.0    0.002823
53.0    0.002352
54.0    0.002004
55.0    0.001655
56.0    0.001388
57.0    0.001114
58.0    0.000932
59.0    0.000707
60.0    0.000536
61.0    0.000343
62.0    0.000193
63.0    0.000080
64.0    0.000038
65.0    0.000016
66.0    0.000000

68 rows × 1 columns

It tells us the average survival probability of the entire population at each time period, taking both dead as well as censored players. It does not tell us the survival probability for each individual censored player, which is what I am interested in. How do I find that? It can be as detailed as, giving the survival probability for each individual player for each of the 66 months. Or, if that’s not possible, I’m ok with having having the survival probabilities of each individual player at a fixed time in the future, say 3 months, or anything else (which is 1 of the 66 time periods).

In other words, instead of the output being a 66x1 vector of average survival probabilities, can I get an output matrix of dimensions txn, where t is the number of time periods and n is the number of censored players in the dataset, and the entry (i,j) is the survival probability of player i at time period j?

If this is not possible with the KM method, please feel free to suggest other methods where its possible to get the survival estimate for each individual. Thank you.

One Answer

The Kaplan Meier curve is a summary statistic, similar to the average. Therefore, it is an unconditional statistic. If you are interested in the conditional survival, we can use the Kaplan Meier curve to do that too. Some notation first. Let $T$ be the unknown time of death, and $S(t) = P(T > t)$ the unconditional survival curve.

Given a subject has lived past some time $s$, we'd like to know $P(T > t | T > s)$. We can expand this conditional probability:

$$ begin{align}P(T > t | T > s) &= frac{ P(T > t text{ and } T > s)}{P(T > s)} &= frac{ P(T > t)}{P(T > s)} text{ ,since t > s} &= frac{ S(t)}{S(s)} end{align} $$

That is, we scale that entire survival curve by $S(s)$.

In practice, if you have a censored subject and you'd like to know their conditional survival after, you just scale by the survival function at their censored time.

In lifelines, there is the property KaplanMeierFitter.conditional_time_to_event_ which computes the median remaining survival time given survival up to time $t$. This can be used as a conditional prediction.

kmf = KaplanMeierFitter().fit(t, e)
censored_times = t[~e.astype(bool)]
predicted_life_remaining = kmf.conditional_time_to_event_.asof(censored_times)

Answered by Cam.Davidson.Pilon on November 10, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP