# Interpret credible intervals / HPD following posterior sampling

Cross Validated Asked by WalterB on December 8, 2020

I am unsure on how to interpret credible interval results. How can credible intervals consist of negative numbers when the collected data only consists of positive numbers? I would expect that, given data ranging anywhere from 1 to 20, the credible interval would tell me (with 95% certainty) that a value would be between x and y – where x and y are in between 1 and 20.

Should I add the general mean to the produced results to obtain what I am looking for?

In my reproducible example below, I generate the following data points:
X – dependent variable, random between 0 and 20
Y – condition variable
Z – study participant ID

I am then looking at the Y-A, Y-B, and Y-C rows of the “Quantiles for each variable” and find the following credible interval for Y-A: [-5.749, 0.495], as opposed to an interval between 1 and 20. Am I simply looking at the wrong data? Thank you very much in advance for your help.

Reproducible example:

library(BayesFactor)
Data <- data.frame(
X = sample(1:20),
Y = sample(c("A", "B", "C"), 20, replace = TRUE),
Z = sample(c("P1", "P2", "P3", "P4"), 20, replace = TRUE)
)
Data$Y <- as.factor(Data$Y)
Data$Z <- as.factor(Data$Z)

bayesfactor = anovaBF(X ~ Y + Z, data = Data, whichRandom = c("Z"))
bayesfactor
bayesfactor_posterior <- posterior(bayesfactor, iterations = 10000)
summary(bayesfactor_posterior)

My results:

Iterations = 1:10000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean     SD Naive SE Time-series SE
mu   10.63868  2.918  0.02918        0.02918
Y-A  -2.40507  1.616  0.01616        0.02169
Y-B  -0.47454  1.419  0.01419        0.01473
Y-C   2.87961  1.916  0.01916        0.02677
Z-P1  0.04134  3.245  0.03245        0.03199
Z-P2 -1.56066  3.111  0.03111        0.03111
Z-P3  2.96385  3.117  0.03117        0.03189
Z-P4 -1.59330  3.500  0.03500        0.03573
sig2 27.99127 10.851  0.10851        0.15682
g_Y   1.20403  8.990  0.08990        0.08990
g_Z   1.02143  1.652  0.01652        0.02174

2. Quantiles for each variable:

2.5%     25%      50%     75%   97.5%
mu    4.83704  9.0158 10.66341 12.2694 16.4573
Y-A  -5.74906 -3.4504 -2.32914 -1.2821  0.4952
Y-B  -3.37089 -1.3441 -0.44655  0.4431  2.2909
Y-C  -0.49649  1.5046  2.75880  4.1339  6.8901
Z-P1 -6.34286 -1.8575  0.02995  1.9121  6.5346
Z-P2 -7.77506 -3.3115 -1.52214  0.2191  4.3945
Z-P3 -2.82042  1.1496  2.83577  4.7147  9.3355
Z-P4 -9.05500 -3.5174 -1.46395  0.5629  4.9008
sig2 14.07844 20.5537 25.78330 32.7843 54.7558
g_Y   0.05299  0.1812  0.39724  0.9168  6.1523
g_Z   0.14551  0.3438  0.59676  1.1107  4.3755

Unfortunately I have not been able to find a satisfactory answer through the posterior generated by the BayesFactor package (though it is most likely because I am doing something wrong). However, I did find an alternative approach which might prove useful for others coming across this question.

Using the library 'bayesboot', I am able to get the posterior(95% HDI) as desired relatively easily. The code below demonstrates this approach. Simply refer to the 'hdi.low' and 'hdi.high' columns.

library(bayesboot)
bayes_A <- bayesboot(Data[Data$Y == "A",]$X, weighted.mean, use.weights = TRUE)
summary(bayes_A)

Summary of the posterior (with 95% Highest Density Intervals):
statistic     mean       sd  hdi.low hdi.high
V1 13.60795 1.875804 9.951562 17.23397

Answered by WalterB on December 8, 2020

## Related Questions

### R: When do we use mean or median for the y axis in ggplot2 when doing analysis on property prices?

0  Asked on January 28, 2021 by chua-s-yang

### COCO evaluation – Negative values on AP and AR

0  Asked on January 28, 2021 by visionenthusiast

### How to make the regressor of LASSO consistent?

0  Asked on January 28, 2021 by zqq

### Suggestions for identifying the most “important” image labels

1  Asked on January 28, 2021 by nlapidot

### Any ideas on how to segment a 2D vector field?

0  Asked on January 28, 2021 by tricostume

### Binomial logistic regression for multiclass problems

1  Asked on January 27, 2021 by mathews24

### How is confidence defined in Expected Calibration Error?

0  Asked on January 26, 2021 by thecity2

### Why does the McNemar’s test use $chi^{2}$ and not the normal distribution?

2  Asked on January 26, 2021

### What algorithm can you use if you want clusters but only are interested in one group?

0  Asked on January 26, 2021 by bonesones

### Can I use an unknown number of variables to model my time-series?

0  Asked on January 26, 2021 by kplauritzen

### Variance of a stationary AR(2) model

2  Asked on January 26, 2021 by user369210

### Avoiding adjustments for time-varying controls in difference-in-differences (DID)?

0  Asked on January 26, 2021

### Removing the effect from structural breaks

1  Asked on January 25, 2021 by kiril-e-proykov

### Recommender System – Predict ratings with Random Forest Regressor or Classifier?

0  Asked on January 24, 2021 by oja-niva

### Nonparametric assessment of multiple predictors

0  Asked on January 24, 2021 by mephisto73

### Calculating measurement variance to achieve desired accuracy in estimation

0  Asked on January 23, 2021 by valjean

### Can large # of epochs or smaller batchsize compensate for smaller data size in training lstms

1  Asked on January 23, 2021 by tjt

### Probability that number of heads exceeds sum of die rolls

5  Asked on January 23, 2021 by user239903

### Combining Sub-Samples for Factor Analysis?

0  Asked on January 22, 2021

### Need to create a model to identify patterns in user details

0  Asked on January 21, 2021 by pooza