Cross Validated Asked by xeon123 on November 28, 2020
I am trying to understand the concept of the confidence interval, but I get confused with t-test, p-values, standard deviation, and quantiles. My problem is the following:
I created a model in machine learning that predicts a dependent variable. For each prediction, I calculate the Relative Prediction Error (prediction - true Value / true value
).
I want to calculate the confidence interval so that I could say, for example, between the interval [-1, 1] (let’s assume that the errors are normally distributed around the 0) is where 95% of the relative errors are. How can I do this?
Is it possible to have the distribution of the Relative Prediction Errors with positive or negative skewness? If so, will the intervals, where 95% of the relative errors are, be symmetrical or asymmetrical? (e.g., [-2, 1] or [-1, 2])?
It sounds like you are looking to produce an interval that captures some proportion of relative prediction errors, and NOT a confidence interval. For clarity, a confidence interval should be understood as a means to quantify uncertainty about the value of a parameter in a statistical model.
To provide an interval that captures $P$% of your relative prediction errors, you could simply use the ($frac{100 - P}{2}$)th and ($100 - frac{100 - P}{2}$)th sample percentiles of your relative prediction errors as estimates of the lower and upper boundary of the interval. For example, if you wanted to capture 90% of the errors, you would use the 5th and 95th percentiles.
I should note that the sample percentiles are estimates of the population percentiles, and so you could create confidence intervals around both bounds of your desired interval to further quantify your uncertainty. I should also note that my proposed method assumes that your relative prediction errors are independent and identically distributed.
There are other methods of estimation that could be used other than taking the sample quantiles directly e.g., fitting a model to your relative prediction errors and using the modeled distribution's percentiles. There are also other ways to construct an interval for instance: my proposal centers around the median, whereas other methods might find the interval with the highest density (referred to as a highest density interval or HDI).
With respect to your second question, there is no guarantee that the distribution of your relative prediction errors will be symmetric. Thus, you should be prepared to see asymmetrical intervals.
Answered by David Telson on November 28, 2020
0 Asked on December 29, 2021 by bioinformatics_student
machine learning neural networks prediction interval regression
0 Asked on December 29, 2021 by tjaqu787
4 Asked on December 29, 2021 by nicolas-molano
1 Asked on December 29, 2021
2 Asked on December 29, 2021 by ahsan
3 Asked on December 27, 2021 by tsteatime
correlation matrix matrix inverse multicollinearity regression
1 Asked on December 27, 2021 by string_is_hard
2 Asked on December 27, 2021 by sgtbp
3 Asked on December 27, 2021
feature selection generalized linear model multiple regression r regression
2 Asked on December 27, 2021 by baktaawar
0 Asked on December 27, 2021 by 1111ktq
binary data clustering sample size statistical significance sufficient statistics
1 Asked on December 27, 2021
2 Asked on December 27, 2021
1 Asked on December 27, 2021 by af1402
lme4 nlme multilevel analysis panel data r repeated measures
0 Asked on December 27, 2021
0 Asked on December 27, 2021 by ruffybeo
0 Asked on December 27, 2021
0 Asked on December 27, 2021 by lakshman
Get help from others!
Recent Answers
Recent Questions
© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir