Interpret credible intervals / HPD following posterior sampling

Question

I am unsure on how to interpret credible interval results. How can credible intervals consist of negative numbers when the collected data only consists of positive numbers? I would expect that, given data ranging anywhere from 1 to 20, the credible interval would tell me (with 95% certainty) that a value would be between x and y - where x and y are in between 1 and 20.

Should I add the general mean to the produced results to obtain what I am looking for?

In my reproducible example below, I generate the following data points:
X - dependent variable, random between 0 and 20
Y - condition variable
Z - study participant ID

I am then looking at the Y-A, Y-B, and Y-C rows of the "Quantiles for each variable" and find the following credible interval for Y-A: [-5.749, 0.495], as opposed to an interval between 1 and 20. Am I simply looking at the wrong data? Thank you very much in advance for your help.

Reproducible example:

library(BayesFactor)
Data <- data.frame(
  X = sample(1:20),
  Y = sample(c("A", "B", "C"), 20, replace = TRUE),
  Z = sample(c("P1", "P2", "P3", "P4"), 20, replace = TRUE)
)
Data$Y <- as.factor(Data$Y)
Data$Z <- as.factor(Data$Z)

bayesfactor = anovaBF(X ~ Y + Z, data = Data, whichRandom = c("Z"))
bayesfactor
bayesfactor_posterior <- posterior(bayesfactor, iterations = 10000)
summary(bayesfactor_posterior)

My results:

Iterations = 1:10000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

Mean     SD Naive SE Time-series SE
mu   10.63868  2.918  0.02918        0.02918
Y-A  -2.40507  1.616  0.01616        0.02169
Y-B  -0.47454  1.419  0.01419        0.01473
Y-C   2.87961  1.916  0.01916        0.02677
Z-P1  0.04134  3.245  0.03245        0.03199
Z-P2 -1.56066  3.111  0.03111        0.03111
Z-P3  2.96385  3.117  0.03117        0.03189
Z-P4 -1.59330  3.500  0.03500        0.03573
sig2 27.99127 10.851  0.10851        0.15682
g_Y   1.20403  8.990  0.08990        0.08990
g_Z   1.02143  1.652  0.01652        0.02174

2. Quantiles for each variable:

2.5%     25%      50%     75%   97.5%
mu    4.83704  9.0158 10.66341 12.2694 16.4573
Y-A  -5.74906 -3.4504 -2.32914 -1.2821  0.4952
Y-B  -3.37089 -1.3441 -0.44655  0.4431  2.2909
Y-C  -0.49649  1.5046  2.75880  4.1339  6.8901
Z-P1 -6.34286 -1.8575  0.02995  1.9121  6.5346
Z-P2 -7.77506 -3.3115 -1.52214  0.2191  4.3945
Z-P3 -2.82042  1.1496  2.83577  4.7147  9.3355
Z-P4 -9.05500 -3.5174 -1.46395  0.5629  4.9008
sig2 14.07844 20.5537 25.78330 32.7843 54.7558
g_Y   0.05299  0.1812  0.39724  0.9168  6.1523
g_Z   0.14551  0.3438  0.59676  1.1107  4.3755

WalterB · Answer

Unfortunately I have not been able to find a satisfactory answer through the posterior generated by the BayesFactor package (though it is most likely because I am doing something wrong). However, I did find an alternative approach which might prove useful for others coming across this question.

Using the library 'bayesboot', I am able to get the posterior(95% HDI) as desired relatively easily. The code below demonstrates this approach. Simply refer to the 'hdi.low' and 'hdi.high' columns.

library(bayesboot)
bayes_A <- bayesboot(Data[Data$Y == "A",]$X, weighted.mean, use.weights = TRUE)
summary(bayes_A)

Summary of the posterior (with 95% Highest Density Intervals):
 statistic     mean       sd  hdi.low hdi.high
        V1 13.60795 1.875804 9.951562 17.23397

Interpret credible intervals / HPD following posterior sampling

One Answer

Add your own answers!

Ask a Question