Stat Digest: Likelihood is not Probability
Clearly explained with examples
When I was beginning my journey in Data Science many many years ago, I used to think probability and likelihood are the same concepts. But they are not.
It is very important to understand the differences to clearly grasp many concepts in Statistics.
Probability measures the chance of an event occurring.
Likelihood measures the goodness of fit of a distribution/model to a given sample data.
Confused? Not to worry, you are not alone here. Let’s absorb the above concepts through an example.
Learning from an Example
Probability
Let’s say we have a bucket full of apples whose weights are distributed normally with a standard deviation of 3 and a mean of 14.
This normal distribution shows the probability density curve.
What is the probability of an apple having a weight between 15 and 16g?
It is the shaded area in the graph below.
That, my friend, is probability. We know the distribution parameters (mean and standard deviation) and we compute the chance of an event occurring.
So,
In other words:
P(15 ≤ weight ≤ 16 | μ = 15, σ = 3) = shaded area = 0.12
The probability is always some area under the curve of the probability density function.
Likelihood
Say, we have the observation that the weight of the apple is 16.5g. What is the likelihood that this apple is drawn from the above distribution with μ = 15 and σ = 3?
As shown below, the likelihood of having this distribution for this data is 0.094.
Likelihood is always the y-value corresponding to a given x-value (weight) of the probability density curve.
Now let’s say, the distribution we check has a μ = 15 and σ = 3. What is the likelihood?
We see that the likelihood increases to 0.117.
This shows that the second distribution with a mean of 15 and sd of 3 is more likely than the first distribution with a mean of 14 and sd of 3 given the observed data of the apple weighing 16.5g.
So, what is the most likely distribution? (This is the idea behind the Maximum Likelihood Estimation)
When the distribution has a mean of 16.5!
So,
In other words,
Likelihood of the distribution = L(μ = 15, σ = 3 | weight = 16.5) = 0.117
Another Example
Ready for another example?
Let’s take the infamous example of tossing a fair coin.
The above event is called a Bernoulli trial, where you have two possible outcomes: success (e.g. getting a head) and failure (e.g. getting a tail).
P(success) + P(failure) = 1
If it were not a fair coin, we would have observed a different probability P(head), say 0.7.
P(success) = P(head) = 0.7 in that case.
P(failure) = P(tail) = 1 — P(success) = 0.3
Now let’s say we toss the coin 10 times and count the number of heads we get. The number of heads could be from 0 to 10.
This experiment is called a Binomial distribution with 10 trials and P(success) = 0.5.
The following diagram shows the probability of getting different numbers of heads.
So, P(X = 2 heads | Binomial distribution with n = 10, p = 0.5) = 0.04
Now, lets look at the likelihood of having the same above binomial distribution (n = 10, p = 0.5), when we observe 6 heads in 10 trials.
So, L(Binomial distribution with n = 10, p = 0.5 | observe 6 heads in 10 trails) = 0.2
For this observation, the Binomial distribution with n = 10, we get the highest likelihood when p = 0.6. This is again what Maximum Likelihood Estimation does.
Summary
- Probability measures the likelihood or possibility of some event happening given the model (or distribution). That is, P(X | θ) where X is the event and θ is the model or distribution.
- Likelihood measures how good is our hypothesized model given some observations. That is, L(θ | X) where θ is the hypothesized model/distribution and X is the observed event.
- Probability is measured by the area under the curve of the probability density function.
- Likelihood is measured by the y-axis of the curve for a given x value (observed data) of the probability density function.
Can you think of another example?
I hope this post helped you clearly understand that likelihood is not probability.
Please do like and share the article to reach a wider audience.
Follow me to see more of such articles providing you with clear intuition behind Data Science foundational concepts.