Understanding the “Bayesian Average”

IMDB rates movies using a score they call the true Bayesian estimate (bottom of the page). I’m pretty sure that’s a made-up term. A couple other sites, like BoardGameGeek, use the same thing and call it a “Bayesian average”. I think that’s a made-up term, too, even through there’s a Wikipedia article on it.

Nonetheless, the formula is simple, and it has a nice interpretation. Here it is:

where is the mean vote across all movies, is the number of votes, is the mean rating for the movie, and is the “minimum number of votes required to be listed in the top 250 (currently 1300)”.

The nice interpretation is this: pretend that, in addition to the votes that users give a movie, you’re also throwing in votes of score each. In effect you’re pushing the scores towards the global average, by votes.

Is this arbitarary? Actually, no. It’s the mean (i.e. MLE) of the posterior distribution you get when you have a Normal prior with mean and precision , and a Normal conditional with variance 1.0.

In other words, you’re starting with a belief that, in the absense of votes, a movie/boardgame should be ranked as average, and you’re assuming that user votes are normally-distributed around the “true” score with variance 1.0. Then you’re looking at the posterior distribution (i.e. the probability distribution that arises as a result of those assumptions), and you’re picking the most likely value from that, which in the case of Gaussians is the mean.

Let’s see how that works.

To find the posterior distribution, we could work through the math, or we could just look at the Wikipedia article on conjugate priors. We’ll see that the posterior distribution of a Normal, when the prior is also a Normal, is a Normal with mean

where and are the mean and precision of the prior, respectively, is the precision of the vote distribution, and is the number of votes. In the case of IMDB, we assumed above that , so we have

Comparing the IMDB equation to this, we can see that above is here, above is here, above is here, and above is the hyperparameter . So we know that even though IMDB says is the “minimum number of votes required to be listed in the top 250 list”, that’s an arbitrary decision on their part: it can be anything and the formula still works. is the precision of the prior distribution; as it gets bigger, the prior distribution gets “sharper”, and thus has more of an effect on the posterior distribution.

Now the assumptions we made to get to this point are almost laughable. If nothing else, we know that Gaussians are unbounded and continuous, and user votes on IMBD are integers in the range of 1-10. The interesting take-away message here is that even though we made a lot of assumptions above that were laughably wrong, the end result is a reasonable formula with an nice, intuitive meaning.

To reply to the article, enter your email address. A copy of the article will be sent to you via email.