Importance Sampling

We learned about a neat trick when doing some Monte Carlo simulations last week. It’s not new by any means, but it’s cool, very useful, and I hadn’t heard of it before.

Consider the application of Monte Carlo to the task of determining the probability that an event occurs. Let $S$ be the state space and $R \subseteq S$ be the region where the event occurs. Assume you have a (“generative”) model that allows you to calculate the probability of any state, $p(s)$, and that you also have a delta function \[ \delta(s) = \left\{ \array{ 1 & \text{ if } s \in R \\ 0 & \text{ otherwise } } \right \] Monte Carlo says: \[ \int_01 f(x)\,dx \approx \frac{1}{N}\sum_{i=1}N f(x_i) \] where $x_i$ is chosen uniformly randomly. So to estimate the size of $R$ we have: \[ \sum_{s\in S} p(s)\delta(s) \approx \frac{1}{N}\sum_{i=1}N p(s_i)\delta(s_i) \] where $s_i$ are chosen uniformly randomly from $S$. However, computationally speaking, this is not ideal, as typically $p(s)$ is extremely small, necessitating software solutions like rational number packages to deal with the underflow. Importance sampling to the rescue. If we have a function $g(x)$ “similar” to $f(x)$, importance sampling says we can rewrite the Monte Carlo equation as \[ \int_01 \frac{f(u)}{g(u)}g(u)\,du \approx \frac{1}{N}\sum_{i=1}N \frac{f(y_i)}{g(y_i)} \] where $y_i$ is drawn according to $g(y)$ rather than uniformly. In our case, if we choose $g(y)=p(s)$, we have \[ \sum_{s\in S} p(s)\delta(s) \approx \frac{1}{N}\sum_{i=1}N \delta(s_i) \] where we choose $s_i$ according to $p(s)$. And now we’re accumulating integers rather than very small floating-point numbers. In our case this lead to a speedup of many orders of magnitude.

Neat eh?


This entry has been archived and comments are no longer accepted.