Bayesian statistics notation: "$P(text{event}|x)$" vs "$P(text{event}|theta, x)$"

Question

Now I am  new to the subject so, having some rotational problems. My issues are -

What is the meaning of $P(text{good bus tomorrow}|x) $ and $P(text{good bus tomorrow}|theta, x)$ or what are the differences between these two?

Why $ P(text{good bus tomorrow}|theta, x)= theta$? Is it because, actually $ P(text{good bus tomorrow}|theta, x)= p(theta)$  but in this case,  the probability of  $theta $  is $theta$, i.e. $p(theta)=theta$ ?

Let me clarify what I am talking about.
The problem on page $22$ of the text of Introduction to Bayesian Statistics by Brendon J. Brewer is written as following  -

After moving to Auckland, I decided that I would take the bus to work
each day. However, I wasn’t very confident with the bus system in my
new city, so for the first week I just took the first bus that came
along and was heading in the right direction, towards the city. In the
first week, I caught 5 morning buses. Of these 5 buses, two of them
took me to the right place, while three of them took me far from work,
leaving me with an extra 20 minute walk. Given this information, I
would like to try to infer the proportion of the buses that are "good", that would take me right
to campus. Let us call this fraction $theta$ and we will infer $theta$ using the Bayesian framework.

Here, $theta =2/5.$
For example, look the following image -

Recall that, if there are $N$ repetitions of a "random experiment" and the "success"
probability is $theta$ at each repetition, then the number of "successes" $x$. To get the likelihoods, we need to think about the properties of our experiment. In particular, we should imagine that we knew the value of $theta$  and were trying to predict what experimental outcome (data) would occur. Ultimately, we want to find the probability of our actual data set (2 out of the 5 buses were "good"), for all of our possible $theta$ values.
$P(theta|x)$ is the posterior probability. It describes $textbf{how certain or confident we
are that hypothesis $theta$ is true, given that}$ we have observed data $x$.
Calculating posterior probabilities is the main goal of Bayesian statistics!
$P(theta)$ is the prior probability, which describes $textbf{how sure we were that}$ $theta$ was true,
before we observed the data $x$.
$P(x|theta)$ is the likelihood. $textbf{If you were to assume that $theta$ is true, this is the
probability}$ that you would have observed data $x$.
$P(x)$ is the marginal likelihood. This is the probability that you would have observed data $x$, whether $theta$ is true or not.
So, $P (theta|x) = frac{P (theta) P(x|theta)}{P (x)}$
The following part is an excerpt from the same text -

In the Bayesian framework, our predictions are always in the form of
probabilities or (later) probability distributions. They are usually
calculated in three stages.
First, you pretend you actually know the true value of the parameters,
and calculate the probability based on that assumption.
Then, you do this for all possible values of the parameter $theta$
(alternatively, you can calculate the probability as a function of
$theta$).
Finally, you combine all of these probabilities in a particular way to
get one final probability which tells you how confident you are of
your prediction.
Suppose we knew the true value of $theta$ was $0.3$. Then, we would
know the probability of catching the right bus tomorrow is $0.3$. If
we knew the true value of $theta$ was $0.4$, we would say the
probability of catching the right bus tomorrow is 0.4.
The problem is, we don’t know what the true value is. We only have the
posterior distribution. Luckily, the sum rule of probability (combined
with the product rule) can help us out.
We are interested in whether I will get the good bus tomorrow. There
are $11$ different ways that can happen. Either $theta=0$ and I get
the good bus, or $theta=0.1$ and I get the good bus, or $theta=0.2$
and I get the good bus, and so on. These 11 ways are all mutually
exclusive. That is, only one of them can be true (since $theta$ is
actually just a single number).
Mathematically, we can obtain the posterior probability of catching
the good bus tomorrow using the sum rule: $$P(text{good bus tomorrow}|x) = sum_{theta} p(theta|x) times P(text{good bus tomorrow}|theta, x) $$$$=  sum_{theta} p(theta|x) times theta$$
This says that the total probability for a good bus tomorrow (given
the data, i.e. using the posterior distribution and not the prior
distribution) is given by

going through each possible $theta$ value,

working out the probability assuming the $theta$ value you are considering is true, multiplying by the probability (given the data)
this $theta$ value is actually true,

and summing.

In this particular problem, because $Ptext{(good bus  tomorrow}|theta, x) = θ$, it just so happens that the probability for
tomorrow is the expectation value of $theta$ using the posterior
distribution.
To three decimal places, the result for the probability tomorrow is
$0.429$. Interestingly, this is not equal to $2/5 = 0.4$.

Sherwin Lott · Answer

$P(text{good bus tomorrow}|x)$ is the probability of getting on a good bus tomorrow given the data $x$ you've already observed, which in this example is that 2 out of 5 buses were good.
$P(text{good bus tomorrow}|theta, x)$ is the probability of getting on a good bus tomorrow given that you already know exactly what $theta$ and $x$ are. Here we treat $theta$ and $x$ as fixed and known, so if you already know $theta$, then the probability of catching a good bus tomorrow is simply $theta$. Hence, $P(text{good bus tomorrow}|theta, x) = theta$.
The probability of catching a good bus on any given day is $theta$. While, we don't actually know what $theta$ is, we have probabilities for what we think $theta$ is before ever coming to Auckland, which we denote by $p(theta)$. In this example, $p(theta) = 1/11$, i.e. each possible $theta$ is equally likely. This can be confusing since both $theta$ and $p(theta)$ are probabilities--the difference being that $p(theta)$ is a "probability distribution" over $theta$.
What you observe is the data $x$, and you want to figure out: $P(text{good bus tomorrow}|x)$. That's where the math comes in:
$$begin{align}
P(text{good bus tomorrow}|x) &= sum_{theta} p(theta|x) times P(text{good bus tomorrow}|theta, x) \
&= sum_{theta} p(theta|x) times theta \
&= E[theta | x]
end{align}$$
The probability of catching a good bus tomorrow is whatever you expect $theta$ to be on average given what you've observed $x$.
The new probability distribution over $theta$ given the data $x$ you've observed is denoted by $p(theta|x)$.

Bayesian statistics notation: "$P(text{event}|x)$" vs "$P(text{event}|theta, x)$"

One Answer

Add your own answers!

Ask a Question