Stat Digest: Likelihood is not Probability

5 min readMar 12, 2023

Clearly explained with examples

Probability vs. Likelihood (Image by Author)

When I was beginning my journey in Data Science many many years ago, I used to think probability and likelihood are the same concepts. But they are not.

It is very important to understand the differences to clearly grasp many concepts in Statistics.

Probability measures the chance of an event occurring.

Likelihood measures the goodness of fit of a distribution/model to a given sample data.

Confused? Not to worry, you are not alone here. Let’s absorb the above concepts through an example.

Learning from an Example

Probability

Let’s say we have a bucket full of apples whose weights are distributed normally with a standard deviation of 3 and a mean of 14.

This normal distribution shows the probability density curve.

The probability density function of the weights of the apples (Image by Author)

What is the probability of an apple having a weight between 15 and 16g?

It is the shaded area in the graph below.

The probability that the weight of the apple is between 15 and 16g (Image by Author)

That, my friend, is probability. We know the distribution parameters (mean and standard deviation) and we compute the chance of an event occurring.

So,

In Probability, we know the distribution, and we predict the possibility of an event occurring based on the known distribution

In other words:

P(15 ≤ weight ≤ 16 | μ = 15, σ = 3) = shaded area = 0.12

The probability is always some area under the curve of the probability density function.

Likelihood

Say, we have the observation that the weight of the apple is 16.5g. What is the likelihood that this apple is drawn from the above distribution with μ = 15 and σ = 3?

As shown below, the likelihood of having this distribution for this data is 0.094.

How to find the likelihood? Likelihood of having the distribution with mean 14 and sd 3 given the data (Image by Author)

Likelihood is always the y-value corresponding to a given x-value (weight) of the probability density curve.

Now let’s say, the distribution we check has a μ = 15 and σ = 3. What is the likelihood?

We see that the likelihood increases to 0.117.

Likelihood of having the distribution with mean 15 and sd 3 given the data (Image by Author)

This shows that the second distribution with a mean of 15 and sd of 3 is more likely than the first distribution with a mean of 14 and sd of 3 given the observed data of the apple weighing 16.5g.

So, what is the most likely distribution? (This is the idea behind the Maximum Likelihood Estimation)

When the distribution has a mean of 16.5!

The most likely distribution given the data (Image by Author)

So,

Given the data (weight of the apple), we measure the likelihood of this data being sampled from a distribution with specific parameters

In other words,

Likelihood of the distribution = L(μ = 15, σ = 3 | weight = 16.5) = 0.117

Another Example

Ready for another example?

Let’s take the infamous example of tossing a fair coin.

Probability of getting a head or a tail when tossing a fair coin

The above event is called a Bernoulli trial, where you have two possible outcomes: success (e.g. getting a head) and failure (e.g. getting a tail).

P(success) + P(failure) = 1

If it were not a fair coin, we would have observed a different probability P(head), say 0.7.

P(success) = P(head) = 0.7 in that case.

P(failure) = P(tail) = 1 — P(success) = 0.3

Now let’s say we toss the coin 10 times and count the number of heads we get. The number of heads could be from 0 to 10.

This experiment is called a Binomial distribution with 10 trials and P(success) = 0.5.

The following diagram shows the probability of getting different numbers of heads.

Binomial distribution for tossing a fair coin 10 times (Image by Author)

So, P(X = 2 heads | Binomial distribution with n = 10, p = 0.5) = 0.04

Now, lets look at the likelihood of having the same above binomial distribution (n = 10, p = 0.5), when we observe 6 heads in 10 trials.

Likelihood of having a Binomial distribution for a fair coin when we observe 6 heads in 10 trials (Image by Author)

So, L(Binomial distribution with n = 10, p = 0.5 | observe 6 heads in 10 trails) = 0.2

For this observation, the Binomial distribution with n = 10, we get the highest likelihood when p = 0.6. This is again what Maximum Likelihood Estimation does.

Summary

Probability measures the likelihood or possibility of some event happening given the model (or distribution). That is, P(X | θ) where X is the event and θ is the model or distribution.
Likelihood measures how good is our hypothesized model given some observations. That is, L(θ | X) where θ is the hypothesized model/distribution and X is the observed event.
Probability is measured by the area under the curve of the probability density function.
Likelihood is measured by the y-axis of the curve for a given x value (observed data) of the probability density function.

Can you think of another example?

I hope this post helped you clearly understand that likelihood is not probability.

Please do like and share the article to reach a wider audience.

Follow me to see more of such articles providing you with clear intuition behind Data Science foundational concepts.