1 Introduction

In these exercises, we will explore an powerful method of sampling from a distribution \(\pi\). The algorithm we will discuss is usually referred to as the Metropolis-Hastings algorithm; Metropolis was one of the authors on the ground breaking paper and Hastings considered extensions to this work and applications to statistics.

The key feature is that again we construct a Markov chain that has \(\pi\) as its stationary distribution, and we need only know \(\pi\) up to a constant factor. The following simple exercise in Bayesian statistics may help to illustrate why this is important.

1.1 Exercise

Let \(X=(X_1, \ldots, X_n)\) be a random sample from the conditional distribution of \(X_1\) given \(\Theta = \theta\), where \(X_1|\theta \sim Bern(\theta)\) and \(\Theta \sim Unif(0,1)\). Find the posterior distribution.

1.1.1 Solution

Let \(x \in \{0,1\}^n\) and \(t= x_1 + \cdots +x_n\). Let \(\theta \in (0,1)\). Let \(r(\theta) = \mathbf{1}[\theta \in (0,1)]\) be the prior pdf for \(\Theta\) and \(f_X\) be the pdf for \(X\). We have that \[\begin{eqnarray*} s(\theta|x) &=& \frac{L(x|\theta)r(\theta)}{ f_X(x)} \\ &=& \frac{\theta^t (1-\theta)^{n-t}}{f_X(x)}. \end{eqnarray*}\] We have that \[f_X(x) = \int_0 ^1 L(x|\theta) d\theta = \int_0 ^1 \theta^t(1-\theta)^{n-t}d \theta,\] where we recognize that we donโ€™t really need to compute this integral, since we recognize that this is the beta distribution, so that \(s(\theta|x)\) is given by the pdf of beta distribution in \(\theta\) with parameters \(\alpha = t+1\) and \(\beta = n-t+1\), where \(t= x_1 + \cdots + x_n\).

A problem arises when we do not recognize the distribution, and the integral of \(f_X\) is difficult to compute, but we still want to sample, simulate, or understand \(s(\theta|x)\).

2 The Metropolis-Hastings algorithm

Suppose that we want to sample from \(\pi(i) = \tilde{\pi}(i)/C\); in this context it is often called the target distribution, and we only know \(\tilde{\pi}\) In the Metropolis algorithm, we choose a symmetric transition matrix \(Q\), called the proposal distribution and we typically write \(Q(j|i) = Q_{ij}\); it turns out than we will many choices for \(Q\), and it is an active area of research on how to choose the best \(Q\) with respect to convergence and robustness. Hastings generalized the Metropolis algorithm to allow from non-symmetric choices for \(Q\). We consider the Markov chain which advances one step in the following way.

If we do move to state \(j\) the proposal is sometimes refered to as being accepted; we can actually have some freedom to choose \(\alpha\), and the choice above is sometimes referred to as the Metropolis choice.

In order to show that this method works, we show that \(\pi\) is indeed a stationary distribution for this Markov chain, and to do this, it is enough to verify that it is a reversible distribution with respect to the transition matrix, which is given by: \[ P(j|i) = Q(j|i)\alpha(j|i),\]

when \(i \not = j\). We need to check that \[ \pi_{i}P(j|i) = \pi_{j} P(i|j);\] notice that there is nothing to check if \(i = j\).

2.1 Exercise

Check it! Note that for \(a >0,\) we have \(\min(a,1) =a\), then \(\min(1/a, 1) =1\).

2.2 Exercise

Technically, we have only verified Metropolis-Hastings for a finite state space, but we also adapt the algorithm for the case of continuous distributions. Try the Metropolis algorithm, on the first exercise using by letting \(Y\) be normal with mean \(i\), where we let \(Q(j|i)\) be the pdf of a normal distribution centered at \(i\) with variance \(1\). Fix \(n=5\), and \(t=2\).

2.2.1 Solution

To check reversibility, for \(i \not = j\), we have

\[ \begin{eqnarray*} \pi_{i}P(j|i) &=& \pi_{i}Q(j|i)\alpha(j|i) \\ &=& \pi_{i}Q(j|i)\min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big). \end{eqnarray*} \]

If \(\min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big)=\tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)}\), then

\[ \begin{eqnarray*} \pi_{i}P(j|i) &=& \pi_{i}Q(j|i) \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \\ &=& \pi_j Q(j|i), \end{eqnarray*} \]

and \(\min\big( \tfrac{\tilde{\pi}(i)}{\tilde{\pi}(j)} \cdot \tfrac{Q(j|i)}{Q(i|j)},1\big)=1\), so that

\[ \begin{eqnarray*} \pi_{i}P(j|i) &=& \pi_j Q(j|i) \\ &=& \pi_j Q(j|i)\min\big( \tfrac{\tilde{\pi}(i)}{\tilde{\pi}(j)} \cdot \tfrac{Q(j|i)}{Q(i|j)},1\big) \\ &=& \pi_j P(j|i), \end{eqnarray*} \] as desired. A similar argument holds, if \(\min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big)=1\).

2.2.2 Solution R code

h <- function(x){
 z=0
  if(x >0 && x <1 ) {z=(x^2) * ( (1-x)^3)}
 z
  }


Yq <- function(x){
   y= rnorm(1,x,1)
y
}

a <- function(j,i){
m=min( c( h(j)/h(i), 1 ))  
m
}

metro <- function(n){
  x=0.5
  for(k in 1:n){
    i = x[length(x)]
    j = Yq(i)
    p = a(j,i)
    x = c(x,i)
      if ( rbinom(1,1,p)==1 ) {
      x[length(x)] <- j
            }
  }
  x
}


z = replicate(2000, metro(2000))

hist(z, prob=TRUE, breaks=50)
x= seq(0, 1, by =0.1)
curve(dbeta(x,3,4), add=TRUE)

3 Endnotes