Cross Validated Asked by r_user on December 8, 2020

I’m attempting to complete a Bayesian logistic regression with the outcome of whether or not a crash occurred. I have various covariates in my model that are widely used to predict crash occurrence. As such, I’m using informed priors from prior publications that report the odds ratio and it’s 95% C.I for each covariate.

Here’s an example of a prior provided by the model I’m pulling from

crash at night (OR 13.1; 95% CI 5.0 to 31.5) : log-odds (1.12,.20) from $$ frac{log(31.5-5)}{3.92}$$

I wanted to apply the log-odds of these results and their standard error in my updated model as priors. My first thought was to apply the log-odds and it’s a standard error on a normal prior. I’m using logic from the sources 1 & 2 listed at the end of the post.

My question, if my assumptions about applying these log-odds and SE’s on a normal prior are correct, can I simply transform the SE of the log odds to variance and implement?

a normal prior:

β_{k} = (μ_{βk},σ^{2}_{βk})

requires a variance rather than an SE. According to citation 3 log-odds SE and be transformed into log-odds VAR:

$$SE[log(OR)] = sqrt{VAR[log(OR)]} => SE^2 = VAR[log(OR)]$$

therefore, if I square the standard error x then I should be able to apply this as my final prior:

β_{k} = (1.12,.04)

Is this assumption correct or am I way off? Is there a better way of implementing log-odd priors and their SE’s into a logistic regression model?

Thanks!

- AdamO (https://stats.stackexchange.com/users/8013/adamo), Prior for Bayesian multiple logistic regression, URL (version: 2016-03-16): https://stats.stackexchange.com/q/202046

"Basically, you have the flexibility to parametrize estimation however

you see fit, but using a model which is linear on the log odds scale

makes sense for many reasons. Furthermore, using a normal prior for

log odds ratios should give you very approximately normal posteriors."

- Sander Greenland, Bayesian perspectives for epidemiological research: I. Foundations and basic methods, International Journal of Epidemiology, Volume 35, Issue 3, June 2006, Pages 765–775, https://doi.org/10.1093/ije/dyi312

"To start, suppose we model these a priori ideas by placing 2:1 odds

on a relative risk (RR) between ½ and 2, and 95% probability on RR

between ¼ and 4. These bets would follow from a normal prior for the

log relative risk ln (RR) that satisfies…"

- StatsStudent (https://stats.stackexchange.com/users/7962/statsstudent), How do I calculate the standard deviation of the log-odds?, URL (version: 2020-04-19): https://stats.stackexchange.com/q/266116

Yes this is correct. Since you asked for a complete answer, I'll start by setting up notation and establishing preliminaries.

It sounds like your goal is to understand the relationship between driving at night and crashing a car. Let's denote the binary dependent variable of whether a car crash occurred as $y = {0,1}$, and the binary independent variable of driving at night as $x= {0,1}$. Furthermore, we'll denote the probability $P[y|x] = p(x)$. We'll estimate $p(x)$ using a logistic regression

As your sources note the motivation of logistic regression is a linear model for the log-odds:
$$
logleft[frac{p(x)}{1-p(x)}right] = alpha + beta x
$$

Since $x$ has only two levels, we can make this notation a little simpler by defining $beta_0 = alpha$ (the log-odds of a crash during the day) and $beta_1 = alpha + beta$ (the log-odds of a crash at night). It will also help if we define the *logit function*:
$$
text{logit}(z) = logleft[frac{z}{1-z}right]
$$
Which allows us to easily write:
$$
p(x) = begin{cases}
text{logit}^{-1}(beta_0) & x=0\
text{logit}^{-1}(beta_1) & x=1\
end{cases}
$$

In the Bayesian methodology this model would be fit to the datapoints $(x_1,y_1),...,(x_n,y_n)$ by looking at the *posterior distribution*:
$$
P[beta_0,beta_1|x_1,...,x_n,y_1,...,y_n] = prodlimits_{i=1}^n p(x_i)^{y_i} (1-p(x_i))^{1-y_i} P[beta_0,beta_1]
$$
Where $P[beta_0,beta_]$ is the *prior distribution* over the parameters, typically assumed to have prior independence in the parameters:
$$
P[beta_0,beta_1] = P[beta_0] P[beta_1]
$$

The paper has provided you the 95% quantiles, mean, and standard deviation of the posterior distribution of the value $text{logit}(p(1)) = beta_1$. Say the mean here is $m_1$ and the standard deviation is $s_1$. A standard result in Bayesian analysis is that, with sufficiently many datapoints, the posterior distribution is approximately normal (the Laplace approximation). Thus $m_1$ and $s_1$ are sufficient to characterize the posterior distribution (approximately), and it is the normal distribution $N(m_1,s_1)$. In general, variance is standard deviation squared, so an alternate parameterization of their posterior/your prior would be the normal distribution $N(m_1,s_1^2)$, which is what you have here:

$beta_k = (1.12,.04)$

Note that the variance of the prior equaling $.04 = .02^2$ is not unique to log-odds. For any distribution, variance equals standard deviation squared (this is just the definition of standard deviation). Your source 3 is actually providing a proof of the Laplace Approximation, ie. the fact that the previous posterior is approximately normal.

In general you will want to also perform a sensitivity analysis on your choice of prior. $N(1.12,.04)$ is very tight around a rather large value of $m_1$ (it implies that the probability of crashing at night is like ~75%). It would be smart to re-run your analysis with multiple priors with increasing variances, to see what happens to your results when you loosen up your prior confidence.

Correct answer by proof_by_accident on December 8, 2020

1 Asked on November 2, 2021 by data-man

0 Asked on November 2, 2021 by franziska

1 Asked on November 2, 2021 by s_haring

3 Asked on March 9, 2021 by pythonnoob

0 Asked on March 4, 2021 by bmurray

0 Asked on March 2, 2021 by pluviophile

1 Asked on March 2, 2021 by sleepy

chi squared test contingency tables ecology hypothesis testing statistical significance

0 Asked on March 1, 2021 by sedi

2 Asked on February 28, 2021 by peterbe

0 Asked on February 27, 2021 by user2991421

categorical data categorical encoding continuous data machine learning random forest

1 Asked on February 27, 2021 by mathslover

1 Asked on February 27, 2021 by misologie

1 Asked on February 25, 2021 by mcgurck

0 Asked on February 25, 2021 by la_haine

0 Asked on February 24, 2021 by zge

0 Asked on February 24, 2021 by diricksen

1 Asked on February 24, 2021 by zvisofer

Get help from others!

Recent Answers

- haakon.io on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir