MCMC (1) : Monte Carlo Method and Metropolis-Hastings Sampling
Monte Carlo method
Monte Carlo method is useful in Bayesian data modeling because maximizing posterior probability is often very difficult and fitting a Gaussian becomes hard.
Monte Carlo method becomes valuable in that we want to generate samples in some situation, and also want to estimate some expectation values of various functions.
What we deal with in this post is only small part of Monte Carlo method. It is going to be good if I have a chance to introduce about all sooner in this blog, but if not, see one of the repositories from my Github
In data science, we used to face the problem to evaluate the probability distribution, $ P(\boldsymbol{x}) $, and
$$ P(\boldsymbol{x}) = P^* (\boldsymbol{x}) / Z $$
I might claim that most of meaningful questions in data science is finding the normal constant, Z. Relative probability up to constant is not too difficult. Thus, we will assume we know $P^* (\boldsymbol{x})$.
To generate samples, we use a proposal density distribution, $Q (\boldsymbol{x}, …)$.
Metropolis-Hastings sampling
In Metropolis-Hastings sampling, we use a proposal density $Q (\boldsymbol{x}, \boldsymbol{x'})$, which is a probability to transit from $\boldsymbol{x}$ to $\boldsymbol{x'}$. $\boldsymbol{x'}$ is the probable next position.
If the proposal distribution is symmetric under the switch of position vectors, the proposal distribution becomes irrelevant at the procedures. What matters is relative proportion of $P^* (\boldsymbol{x})$.
If $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$ is bigger than 1, just draw the $\boldsymbol{x'}$ to the sample. If $\boldsymbol{x} = x^{(n)}$, we set $\boldsymbol{x'} = x^{(n+1)}$.
If $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$ is smaller than 1, draw the $\boldsymbol{x'}$ to the sample with the probability, $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$.
Intuitively, if we keep doing this procedure, we will obtain the sample of $P(\boldsymbol{x})$ up to some proportional constant.
Let us think of an example represented by figure 29.12 of the book “Information Theory, Inference, and Learning Algorithms” by David Mackay.1 I have already commented about the book in the previous post, and I would do it over and over, since I learned the theory of data science and machine learning from the book, and I would implement the theory at the posts of this blog. Anytime if someone feels I did not explain enough about the theory, you can open the on-line book.
Now let us look at Python scripts.
import matplotlib.pyplot as plt
import numpy as np
import random
%matplotlib inline
plt.style.use('ggplot')
$$
P(x)= \{
\begin{array}{cc}
\frac{1}{21} ~~~as ~~ 0\leq x<21, ~for ~~integers \\\
0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ otherwise\
\end{array}
$$
and the proposal distribution is
$$
Q (x'; x)= \{
\begin{array}{cc}
\frac{1}{2} ~~~~~~x' = x \pm 1 \\\
0 ~~~~~~ otherwise\
\end{array}
$$
niters= 400
x0=10
List = [x0]
for i in range(niters):
if x0==20:
next = x0-1
elif x0==0:
next = x0+1
else:
next = x0+1 if np.random.random()>=0.5 else x0-1
List.append(next)
x0=next
plt.plot(List,linewidth=0.3)
What is interesting in this example is, the Metropolis-Hastings algorithm does not do much here. Of course, the disposal distribution is symmetric, and the proportion, $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$, is 1. Therefore, the sample is drawn every time the disposal progresses. The script for disposal and choosing is at _else loop_.
Histogram of occupancy of the states
The following scripts are just histogram of occupancy from two kinds of samples. The former is Metropolis-Hastings and the latter is independent sampling. As the number of the samples increases, the histogram graph should approach flat graph as $P(x)$ does.
hist0, bin_edges0 = np.histogram(List, bins=20, normed=False)
factor = 1.0
bin_centers0 = (bin_edges0[:-1] + bin_edges0[1:]) / 2.
bin_width0= bin_edges0[1]-bin_edges0[0]
plt.step(bin_centers0, factor*hist0, linewidth=1)
plt.plot(bin_centers0, hist0, drawstyle='steps-mid', linestyle='-', alpha=1,)
List2=[np.random.randint(0,20) for i in range(niters)]
hist2, bin_edges2 = np.histogram(List2, bins=20, normed=False)
factor = 1.0
bin_centers2 = (bin_edges2[:-1] + bin_edges2[1:]) / 2.
bin_width2= bin_edges2[1]-bin_edges2[0]
plt.step(bin_centers2, factor*hist2, linewidth=1)
plt.axis([0,20,0,40])
def MHhistogram(ninters, initial):
x0=initial
List = [x0]
for i in range(niters):
if x0==20:
next = x0-1
elif x0==0:
next = x0+1
else:
next = x0+1 if np.random.random()>=0.5 else x0-1
List.append(next)
x0=next
hist0, bin_edges0 = np.histogram(List, bins=20, normed=False)
bin_centers0 = (bin_edges0[:-1] + bin_edges0[1:]) / 2.
bin_width0= bin_edges0[1]-bin_edges0[0]
return bin_centers0, hist0
return bin_centers0, hist0
MHhistogram(400, 10)
(array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5,
9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5,
18.5, 19.5]),
array([ 2, 4, 7, 9, 10, 10, 13, 24, 27, 24, 27, 25, 23, 22, 24, 24, 19,
20, 25, 62]))
plt.plot(MHhistogram(1400, 10)[0],MHhistogram(1400, 10)[1],drawstyle='steps-mid', linestyle='-', alpha=1,)
f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True)
ax1.plot(MHhistogram(100, 10)[0],MHhistogram(100, 10)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax1.set_title('Metropolis')
plt.axis([0,20,0,60])
ax2.plot(MHhistogram(400, 10)[0],MHhistogram(400, 10)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax3.plot(MHhistogram(1200, 10)[0],MHhistogram(1200, 10)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
def randomhistogram(niters):
List2=[np.random.randint(0,20) for i in range(niters)]
hist2, bin_edges2 = np.histogram(List2, bins=20, normed=False)
bin_centers2 = (bin_edges2[:-1] + bin_edges2[1:]) / 2.
return bin_centers2, hist2
f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True)
ax1.plot(randomhistogram(100)[0],randomhistogram(100)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax1.set_title('independent sampling')
plt.axis([0,20,0,100])
ax2.plot(randomhistogram(400)[0],randomhistogram(400)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax3.plot(randomhistogram(1200)[0],randomhistogram(1200)[1], drawstyle='steps-mid', linestyle='-', alpha=1)