How to compute confidence interval for leave-one-out cross-validated AUC that is also repeated many times?

Question

I have a small dataset of 100 data points and trained a random forest classifier using nested leave-one-out cross-validation. The details go like this:

In each trial out of 10:

for each leave-one-out patient out of 100:

take the remaining 99 patients, do 10-fold cross-validation to 
optimize for the best hyperparameters using grid search. Take the 
best performing set of parameters (based on 10-fold cv average), and 
evaluate on the leave-one-out patient.

Because inner 10-fold cv is random, the models trained in each trial are different (have different parameters) and as a result, if I do 10 trials, I get 10 x 100 models and 10 x 100 predictions. I can calculate 10 ROC curves for the entire set of 100 patients and calculate the AUC confidence interval for each of 10 curves using cvAUC.

My question is, would it make sense to consider the 10 trials as simply additional leave-one-out validation splits? In other words, what is the statistical consequence if I simply pool the 10 x 100 predictions and treat it like leave-one-out cross-validation on 1000 patients and derive its confidence interval? and is there a better way to do this?

Thanks!

Demetri Pananos · Answer

I'm not sure of the sampling distribution of the AUC, especially when using something like a random forest.  It might be more tractable to bootstrap the AUC and then use a bootstrap confidence interval.
Frank Harrell and Ewout Steyerberg have written about how to do bootstrap validation using the bootstrap.  From Ewout's book Clinical Prediction Models, the steps are:

Construct a model in the original sample; determine the apparent performance on the data from the sample used to construct the model;
Draw a bootstrap sample (Sample*) with replacement from the original sample;
Construct a model (Model*) in Sample*, replaying every step that was done in the original sample, especially model specification steps such as selection of predictors. Determine the bootstrap performance as the apparent performance of Model* in Sample*;
Apply Model* to the original sample without any modification to determine the test performance;
Calculate the optimism as the difference between bootstrap performance and test performance;
Repeat steps 1–4 many times, at least 200, to obtain a stable mean estimate of the optimism;
Subtract the mean optimism estimate (step 6) from the apparent performance (step 1) to obtain the optimism-corrected performance estimate.

The result of this process is an optimism corrected estimate of the AUC.  We need to bootstrap this entire process in order to get a confidence interval for the optimism corrected AUC.  Frank has described this process on datamethods and it is deceptively simple using rms (though one must be patient since many many models are being constructed).
If you're using python, or some other language which is not R, you have to roll your own bootstrap estimator yourself.  Lucky for you I've done this in python before.  Here is an example
import numpy as np
from sklearn.utils import resample
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

X, y = load_breast_cancer(return_X_y=True)

def bootstraps(X, y):
    while True:
        yield resample(X, y)

def estimate_optimism(model, X, y, nboot=100):
    # Step 1 from Ewout's book
    model.fit(X, y)
    ypred = model.predict_proba(X)[:,1]
    original_performance = roc_auc_score(y, ypred)
    
    bootstrap = bootstraps(X, y)
    auc_train = np.zeros(nboot)
    auc_test = np.zeros(nboot)
    
    #Step 6
    for i in range(nboot):
        # Step 2
        Xstar, ystar = next(bootstrap)
        # Step 3
        model.fit(Xstar, ystar)
        ypred = model.predict_proba(Xstar)[:, 1]
        auc_train[i] = roc_auc_score(ystar, ypred)
        #Step 4
        ypred = model.predict_proba(X)[:, 1]
        auc_test[i] = roc_auc_score(y, ypred)
    
    #Step 5
    average_optimism = (auc_train - auc_test).mean()
    
    #Step 7
    return original_performance - average_optimism

model = RandomForestClassifier(max_depth=3, n_estimators=10)

# Optimism corrected estimate
estimate_optimism(model, X, y)

# Bootstrap the optimism correction process
bootstrap = bootstraps(X, y)
nboot = 100
estimates = np.zeros(nboot)

for i in range(nboot):
    # we are bootstrapping 100x and then repeating the estimate_optimism 100x each for 100^2 runs. 
    # This takes time
    Xnew, ynew = next(bootstrap)
    estimates[i] = estimate_optimism(model, Xnew, ynew)

Were I to run this seriosuly, I would increase the nboot in both estimate_optimism and the final loop to be something larger. From here you would take quantiles of estimates in order to get a confidence interval for the optimism corrected AUC.
If I remember correctly, using repeated K fold cross validation might be similar to bootstrapping (Frank says here that 100 repeats of 10 fold CV should be as good as the bootstrap, but he doesn't offer any links to experimental evidence for this being the case).  Were you to use repeated CV, the code becomes a lot easier in python
from sklearn.model_selection import cross_val_score, RepeatedKFold

model = RandomForestClassifier(max_depth=3, n_estimators=10)

# Bootstrap the optimism correction process
bootstrap = bootstraps(X, y)
nboot = 100
estimates = np.zeros(nboot)

for i in range(nboot): 
    Xnew, ynew = next(bootstrap)
    estimates[i] = cross_val_score(model, Xnew, ynew, cv = RepeatedKFold(n_splits=10, n_repeats=100), scoring = 'roc_auc').mean()
```

Christopher John · Answer

I’d remove the in each trial bit in the first line, that doesn’t make sense to me... With nested cross validation you have one nested for loop, not two.

Then you have for each leave one out do inner cv on the 99 observations to get parameters and then fit the model on the 99, then you have one output prediction per observation in nested LOOCV. So there is one ROC and one AUC for the 100 cross validated probabilities.

You can get the CI of the AUC through an equation if required. https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_the_Area_Under_an_ROC_Curve.pdf

How to compute confidence interval for leave-one-out cross-validated AUC that is also repeated many times?

2 Answers

Add your own answers!

Ask a Question