Here's my attempt to explain MCMC in a few paragraphs. Drop a droplet of ink in ...

Here's my attempt to explain MCMC in a few paragraphs.

Drop a droplet of ink in a glass of water. In time the ink will spread out and after a few minutes it will be homogeneously distributed throughout the whole glass. Why is it so? When it gets to the steady state, you can take any cubic millimeter of water, and any given second there will be as many particles of ink leaving that cube as entering the cube. In other words, each cubic millimeter of water is in balance.

What's that got to do with Monte Carlo? Many problems can be reduced to calculating the expected value of a function of a multi-dimensional random variable. All you need to do is take many samples of that variable apply the function, and then take the average. Simple, right? Yes, fairly simple, but how do you draw samples from a multi-dimensional distribution?

You pretend you are letting a drop of ink diffuse. You need to make it so that in the steady state, the ink's concentration is the pdf of the distribution you want to draw from. You want to make use of the balance: in each tiny little cube, as many particles of ink enter as they leave.

Nobody knows how to enforce this balance. It's too hard: a particle can leave in many ways, and can enter from many places.

But when something is hard, sometimes it's simpler to solve a more difficult problem. In this case, you enforce a more stringent condition, called "the detailed balance". You make sure that for any two little cubes, as many particles migrate from one to the other as migrate the opposite way.

So, let's say you take point A and point B. The pdf function at the two points has values P(A) and P(B). You have the freedom to choose the transition probabilities P(A->B) and P(B->A). How do you choose them? Well, you enforce the "detailed balance": how many "particles of ink" go from A to B? The number is proportional to the probability of a particle being at A to begin with, and then transitioning from A to B. So, it's P(A)P(A->B).

Great. The detailed balance condition is P(A)P(A->B) = P(B)P(B->A).

Is there a simple way to get these transition probabilities? Here's a simple example: let's say P(A) is twice as high as P(B). We need to make P(A->B) be just half of P(B->A). We do it in two steps: first we make P(A->B) and P(B->A) equal (for example we draw from the uniform distribution for both) and then we flip a coin, and if it's heads we accept the move from A to B and if it's tails we stay in A. This recipe (called Metropolis) does not work just for P(B)/P(A) = 1/2, but for any ratio (of course you need to use a biased coin).

So, that's MCMC. You start from an arbitrary point, draw from the uniform distribution, check the ratio of the probabilities of the end and start point, and if the ratio is less than one, you flip a biased coin and accept the move with the ratio of those probabilities, otherwise stay put. Rinse, repeat. After many steps you end up with a point that's randomly distributed with the target probability.

Oh, and there's a bonus. Since the acceptance ratio only depends on the ratio of the pdf at two points, you don't actually need to know the pdf itself, it's ok if you know the pdf up to a normalization constant. Which is exactly the situation when you do Bayesian estimation. That normalization constant is very difficult to calculate (it is itself a multi-dimensional integral). This lucky strike is what allowed Bayesian estimation to exist. Without it, all the field of Bayesian estimation would be just idle speculation.