# Calculate confidence interval over Relative Prediction Error

Cross Validated Asked by xeon123 on November 28, 2020

I am trying to understand the concept of the confidence interval, but I get confused with t-test, p-values, standard deviation, and quantiles. My problem is the following:

I created a model in machine learning that predicts a dependent variable. For each prediction, I calculate the Relative Prediction Error (prediction - true Value / true value).

1. I want to calculate the confidence interval so that I could say, for example, between the interval [-1, 1] (let’s assume that the errors are normally distributed around the 0) is where 95% of the relative errors are. How can I do this?

2. Is it possible to have the distribution of the Relative Prediction Errors with positive or negative skewness? If so, will the intervals, where 95% of the relative errors are, be symmetrical or asymmetrical? (e.g., [-2, 1] or [-1, 2])?

It sounds like you are looking to produce an interval that captures some proportion of relative prediction errors, and NOT a confidence interval. For clarity, a confidence interval should be understood as a means to quantify uncertainty about the value of a parameter in a statistical model.

To provide an interval that captures $$P$$% of your relative prediction errors, you could simply use the ($$frac{100 - P}{2}$$)th and ($$100 - frac{100 - P}{2}$$)th sample percentiles of your relative prediction errors as estimates of the lower and upper boundary of the interval. For example, if you wanted to capture 90% of the errors, you would use the 5th and 95th percentiles.

I should note that the sample percentiles are estimates of the population percentiles, and so you could create confidence intervals around both bounds of your desired interval to further quantify your uncertainty. I should also note that my proposed method assumes that your relative prediction errors are independent and identically distributed.

There are other methods of estimation that could be used other than taking the sample quantiles directly e.g., fitting a model to your relative prediction errors and using the modeled distribution's percentiles. There are also other ways to construct an interval for instance: my proposal centers around the median, whereas other methods might find the interval with the highest density (referred to as a highest density interval or HDI).

With respect to your second question, there is no guarantee that the distribution of your relative prediction errors will be symmetric. Thus, you should be prepared to see asymmetrical intervals.

Answered by David Telson on November 28, 2020

## Related Questions

### Prediction Intervals (Conformal Predictions) for Regression Problems

0  Asked on December 29, 2021 by bioinformatics_student

### Why not use % change in regression instead of log diff?

0  Asked on December 29, 2021 by tjaqu787

### Estimating expected values for correlated data using random effects models

4  Asked on December 29, 2021 by nicolas-molano

### Should multiple testing correct with bonferroni ever reduce a p value’s size?

1  Asked on December 29, 2021

### Compute Mean of a Clipped Normal Distribution

2  Asked on December 29, 2021 by ahsan

### What is an example of perfect multicollinearity?

3  Asked on December 27, 2021 by tsteatime

### Should I use a seasonal arima or stl decomposition and model residuals only?

1  Asked on December 27, 2021 by string_is_hard

### Endogenous controls in linear regression – Alternative approach?

2  Asked on December 27, 2021 by sgtbp

### Variable selection in logistic regression model

3  Asked on December 27, 2021

### EM Algorithm Derivation, Discrete Case

1  Asked on December 27, 2021

### Should we really do Re-Sampling in Class Imbalance data?

2  Asked on December 27, 2021 by baktaawar

### How to determine data size is statistically efficient?

0  Asked on December 27, 2021 by 1111ktq

### (Non-limit) distribution of maxima from different univariate, discrete and stationary time series

1  Asked on December 27, 2021

### Confused about stationarity and ARIMA processes

2  Asked on December 27, 2021

### How to remove correlated features?

1  Asked on December 27, 2021 by ichait

### Between- and within-person level effects when using multilevel modelling for longitudinal data in R

1  Asked on December 27, 2021 by af1402

### How can I separate the abundance factor of the incidence results?

0  Asked on December 27, 2021

### Chi Square Test on non-numeric data in R

0  Asked on December 27, 2021 by ruffybeo

### How to make correlation test with compositional data?

0  Asked on December 27, 2021

### What are the parameters in signal recovery? Whether source of these parameters are the sampling property of impulse response?

0  Asked on December 27, 2021 by lakshman