Stat Digest: The idea behind accuracy paradox
All you need to know about the accuracy paradox and how to avoid it
When you have highly imbalanced data, the accuracy paradox tells why it is a bad idea to measure the model’s predictive performance using accuracy.
This paradox shows us that higher accuracy does not necessarily result in higher predictive performance. In other words, you may have a model that has a lower accuracy than another model, but its predictive power could be higher than the other model.
Let me explain using an example.
Bob is tasked with designing a spam filter for his company.
The goal is to identify benign emails (ham) from spam emails.
Bob collects the following data:
Number of hams: 900
Number of spams: 100
Bob implements a state-of-the-art classifier and got 90% accuracy.
The management is satisfied with the accuracy level and they deployed the solution in production.
All are happy that the spam problem that plagued the workplace is solved with this rollout.
However, with time, employees started complaining about more and more spams. With such good accuracy, why is this problem still not solved?
The management asked Alice to investigate the issue.
Alice checked the confusion matrix for the classifier trained by Bob:
Alice was shocked to find out that all emails are flagged as ham. In other words, the model had zero predictive power.
Alice set out to design a new classifier. Her classifier achieved an accuracy of 88.5% (less than that of Bob’s).
The management was asking Alice: How come your model is better than Bob’s one given that the accuracy is lower?
Alice confidently explained.
Alice shows that Bob’s has zero precision and recall. Recall identifies how actual spams are identified, whereas precision identifies how identified spams are actually spam.
The following figure from the Wikipedia shows the two concepts with an excellent illustration:
This is a highly imbalanced classification problem with a 9–1 ratio of the two classes. When we have such an imbalance, accuracy is usually not a good measure of predictive performance.
Instead, we should check metrics such as precision and recall.
Let’s find out the precision and recall of Alice’s model.
Alice’s model has a recall of 0.75 (75/(75+25)) and a precision of 0.4 (75/(75+90)).
While it is not perfect, Alice’s model has more predictive power compared to Bob’s model even though the accuracy is less.
This my friend is the Accuracy paradox: You can have high predictive power even though the accuracy is lower.
Take Aways:
- Don’t fall for the Accuracy paradox
- When you are building a model with highly imbalanced data, use precision and recall (or F1-score) as the matric of performance