MCMC (1) : Monte Carlo Method and Metropolis-Hastings Sampling

Monte Carlo method

Monte Carlo method is useful in Bayesian data modeling because maximizing posterior probability is often very difficult and fitting a Gaussian becomes hard.

Monte Carlo method becomes valuable in that we want to generate samples in some situation, and also want to estimate some expectation values of various functions.

What we deal with in this post is only small part of Monte Carlo method. It is going to be good if I have a chance to introduce about all sooner in this blog, but if not, see one of the repositories from my Github

In data science, we used to face the problem to evaluate the probability distribution, $ P(\boldsymbol{x}) $, and

$$ P(\boldsymbol{x}) = P^* (\boldsymbol{x}) / Z $$

I might claim that most of meaningful questions in data science is finding the normal constant, Z. Relative probability up to constant is not too difficult. Thus, we will assume we know $P^* (\boldsymbol{x})$.

To generate samples, we use a proposal density distribution, $Q (\boldsymbol{x}, …)$.

Metropolis-Hastings sampling

In Metropolis-Hastings sampling, we use a proposal density $Q (\boldsymbol{x}, \boldsymbol{x'})$, which is a probability to transit from $\boldsymbol{x}$ to $\boldsymbol{x'}$. $\boldsymbol{x'}$ is the probable next position.

If the proposal distribution is symmetric under the switch of position vectors, the proposal distribution becomes irrelevant at the procedures. What matters is relative proportion of $P^* (\boldsymbol{x})$.

If $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$ is bigger than 1, just draw the $\boldsymbol{x'}$ to the sample. If $\boldsymbol{x} = x^{(n)}$, we set $\boldsymbol{x'} = x^{(n+1)}$.

If $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$ is smaller than 1, draw the $\boldsymbol{x'}$ to the sample with the probability, $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$.

Intuitively, if we keep doing this procedure, we will obtain the sample of $P(\boldsymbol{x})$ up to some proportional constant.

Let us think of an example represented by figure 29.12 of the book “Information Theory, Inference, and Learning Algorithms” by David Mackay.¹ I have already commented about the book in the previous post, and I would do it over and over, since I learned the theory of data science and machine learning from the book, and I would implement the theory at the posts of this blog. Anytime if someone feels I did not explain enough about the theory, you can open the on-line book.

Now let us look at Python scripts.

import matplotlib.pyplot as plt
import numpy as np
import random

%matplotlib inline
plt.style.use('ggplot')

$$ P(x)= \{ \begin{array}{cc} \frac{1}{21} ~~~as ~~ 0\leq x<21, ~for ~~integers \\\
0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ otherwise\
\end{array} $$

and the proposal distribution is

$$ Q (x'; x)= \{ \begin{array}{cc} \frac{1}{2} ~~~~~~x' = x \pm 1 \\\
0 ~~~~~~ otherwise\
\end{array} $$

niters= 400
x0=10
List = [x0]
for i in range(niters):
    if x0==20:
        next = x0-1

    elif x0==0:
        next = x0+1

    else:
        next = x0+1 if np.random.random()>=0.5 else x0-1
    List.append(next)
    x0=next

plt.plot(List,linewidth=0.3)

png

What is interesting in this example is, the Metropolis-Hastings algorithm does not do much here. Of course, the disposal distribution is symmetric, and the proportion, $P^* (\boldsymbol{x'}) / P^* (\boldsymbol{x})$, is 1. Therefore, the sample is drawn every time the disposal progresses. The script for disposal and choosing is at _else loop_.

Histogram of occupancy of the states

The following scripts are just histogram of occupancy from two kinds of samples. The former is Metropolis-Hastings and the latter is independent sampling. As the number of the samples increases, the histogram graph should approach flat graph as $P(x)$ does.

hist0, bin_edges0 = np.histogram(List, bins=20, normed=False)
factor = 1.0
bin_centers0 = (bin_edges0[:-1] + bin_edges0[1:]) / 2.

bin_width0= bin_edges0[1]-bin_edges0[0]



plt.step(bin_centers0, factor*hist0, linewidth=1)

png

plt.plot(bin_centers0, hist0, drawstyle='steps-mid', linestyle='-', alpha=1,)

png

List2=[np.random.randint(0,20) for i in range(niters)]
hist2, bin_edges2 = np.histogram(List2, bins=20, normed=False)
factor = 1.0
bin_centers2 = (bin_edges2[:-1] + bin_edges2[1:]) / 2.

bin_width2= bin_edges2[1]-bin_edges2[0]

plt.step(bin_centers2, factor*hist2, linewidth=1)
plt.axis([0,20,0,40])

png

def MHhistogram(ninters, initial):
    x0=initial
    List = [x0]
    for i in range(niters):
        if x0==20:
            next = x0-1
        elif x0==0:
            next = x0+1
        else:
            next = x0+1 if np.random.random()>=0.5 else x0-1
        List.append(next)
        x0=next

    hist0, bin_edges0 = np.histogram(List, bins=20, normed=False)
    bin_centers0 = (bin_edges0[:-1] + bin_edges0[1:]) / 2.

    bin_width0= bin_edges0[1]-bin_edges0[0]



    return bin_centers0, hist0


    return bin_centers0, hist0

MHhistogram(400, 10)

(array([  0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,
          9.5,  10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5,
         18.5,  19.5]),
 array([ 2,  4,  7,  9, 10, 10, 13, 24, 27, 24, 27, 25, 23, 22, 24, 24, 19,
        20, 25, 62]))

plt.plot(MHhistogram(1400, 10)[0],MHhistogram(1400, 10)[1],drawstyle='steps-mid', linestyle='-', alpha=1,)

png

f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True)
ax1.plot(MHhistogram(100, 10)[0],MHhistogram(100, 10)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax1.set_title('Metropolis')
plt.axis([0,20,0,60])
ax2.plot(MHhistogram(400, 10)[0],MHhistogram(400, 10)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax3.plot(MHhistogram(1200, 10)[0],MHhistogram(1200, 10)[1], drawstyle='steps-mid', linestyle='-', alpha=1)

png

def randomhistogram(niters):
    List2=[np.random.randint(0,20) for i in range(niters)]
    hist2, bin_edges2 = np.histogram(List2, bins=20, normed=False)
    bin_centers2 = (bin_edges2[:-1] + bin_edges2[1:]) / 2.
    return bin_centers2, hist2

f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True)
ax1.plot(randomhistogram(100)[0],randomhistogram(100)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax1.set_title('independent sampling')
plt.axis([0,20,0,100])
ax2.plot(randomhistogram(400)[0],randomhistogram(400)[1], drawstyle='steps-mid', linestyle='-', alpha=1)
ax3.plot(randomhistogram(1200)[0],randomhistogram(1200)[1], drawstyle='steps-mid', linestyle='-', alpha=1)

png

Besides, the book is free on-line! I would even donate some money for him after I earn some money. Now I am too poor to donate to faculty member. ↩︎

MCMC (1) : Monte Carlo Method and Metropolis-Hastings Sampling

Monte Carlo method

Metropolis-Hastings sampling

Histogram of occupancy of the states

Namshik Kim

Machine Learns from Cardiologist (4)

Machine Learns from Cardiologist (3)

Machine Learns from Cardiologist (2)

Macnine Learns from Cardiologist (1)

Youtube Data API on GCP

Clean Coding and Short Run Time

Revisited Variational Inference

Rough Review of WaveGAN

Introduction to GAN

How to Test Progressive Growing of GAN from the Github Source