I found a really cool visual explanation of Simpson’s Paradox on the Wikipedier.
Informally, Simpson’s Paradox states that, if you and I are competing, and I do better than you in category A, and I also do better than you in category B, my overall score for both categories combined could actually be worse than yours. The Wikipidia article gives a real-life example:
“In both 1995 and 1996, [David] Justice had a higher batting average […] than [Derek] Jeter; however, when the two years are combined, Jeter shows a higher batting average than Justice.”
And there’s also a famous legal case about Berkeley’s admission rates for women from the 70’s, where they were sued because the overall admission rate was lower for women than for men. Turns out that if you break it down by department, each department actually had a higher admission rate for women.
This all sounds crazy until you stare at the picture above for a while. The slopes of the lines are the percentages. Both solid blue vectors have smaller slopes than their corresponding solid red vectors, but when you add the them (shown as a dashed lines), the blue vectors have a bigger slope.
What the picture really makes clear is that a ratio or a percentage is not a complete description of the situation. Knowing a percentage is equivalent to knowing the angle of a vector without knowing its magnitude. You can see from the picture that this isn’t a weird corner case; there are many choices for the second blue vector that would have the same result.
It’s been probably 10-12 years since I learned about Simpson’s Paradox in some undergrad stats class. Now I finally really understand it.