# 1 Introduction

In these exercises, we will explore an powerful method of sampling from a distribution $$\pi$$. The algorithm we will discuss is usually referred to as the Metropolis-Hastings algorithm; Metropolis was one of the authors on the ground breaking paper and Hastings considered extensions to this work and applications to statistics.

The key feature is that again we construct a Markov chain that has $$\pi$$ as its stationary distribution, and we need only know $$\pi$$ up to a constant factor. The following simple exercise in Bayesian statistics may help to illustrate why this is important.

## 1.1 Exercise

Let $$X=(X_1, \ldots, X_n)$$ be a random sample from the conditional distribution of $$X_1$$ given $$\Theta = \theta$$, where $$X_1|\theta \sim Bern(\theta)$$ and $$\Theta \sim Unif(0,1)$$. Find the posterior distribution.

### 1.1.1 Solution

Let $$x \in \{0,1\}^n$$ and $$t= x_1 + \cdots +x_n$$. Let $$\theta \in (0,1)$$. Let $$r(\theta) = \mathbf{1}[\theta \in (0,1)]$$ be the prior pdf for $$\Theta$$ and $$f_X$$ be the pdf for $$X$$. We have that $\begin{eqnarray*} s(\theta|x) &=& \frac{L(x|\theta)r(\theta)}{ f_X(x)} \\ &=& \frac{\theta^t (1-\theta)^{n-t}}{f_X(x)}. \end{eqnarray*}$ We have that $f_X(x) = \int_0 ^1 L(x|\theta) d\theta = \int_0 ^1 \theta^t(1-\theta)^{n-t}d \theta,$ where we recognize that we donâ€™t really need to compute this integral, since we recognize that this is the beta distribution, so that $$s(\theta|x)$$ is given by the pdf of beta distribution in $$\theta$$ with parameters $$\alpha = t+1$$ and $$\beta = n-t+1$$, where $$t= x_1 + \cdots + x_n$$.

A problem arises when we do not recognize the distribution, and the integral of $$f_X$$ is difficult to compute, but we still want to sample, simulate, or understand $$s(\theta|x)$$.

# 2 The Metropolis-Hastings algorithm

Suppose that we want to sample from $$\pi(i) = \tilde{\pi}(i)/C$$; in this context it is often called the target distribution, and we only know $$\tilde{\pi}$$ In the Metropolis algorithm, we choose a symmetric transition matrix $$Q$$, called the proposal distribution and we typically write $$Q(j|i) = Q_{ij}$$; it turns out than we will many choices for $$Q$$, and it is an active area of research on how to choose the best $$Q$$ with respect to convergence and robustness. Hastings generalized the Metropolis algorithm to allow from non-symmetric choices for $$Q$$. We consider the Markov chain which advances one step in the following way.

• If we are at a state $$i$$, so that $$X_n=i$$, then we generate a random variable $$Y=j$$ with distribution $$Q(\cdot|i)$$.

• Given that $$Y=j$$, we move to state $$j$$, so that $$X_{n+1} = j$$, with probability $\alpha(j|i) = \min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big);$

• Otherwise, we stay in state $$i$$, so that $$X_{n+1} = i$$.

If we do move to state $$j$$ the proposal is sometimes refered to as being accepted; we can actually have some freedom to choose $$\alpha$$, and the choice above is sometimes referred to as the Metropolis choice.

In order to show that this method works, we show that $$\pi$$ is indeed a stationary distribution for this Markov chain, and to do this, it is enough to verify that it is a reversible distribution with respect to the transition matrix, which is given by: $P(j|i) = Q(j|i)\alpha(j|i),$

when $$i \not = j$$. We need to check that $\pi_{i}P(j|i) = \pi_{j} P(i|j);$ notice that there is nothing to check if $$i = j$$.

## 2.1 Exercise

Check it! Note that for $$a >0,$$ we have $$\min(a,1) =a$$, then $$\min(1/a, 1) =1$$.

## 2.2 Exercise

Technically, we have only verified Metropolis-Hastings for a finite state space, but we also adapt the algorithm for the case of continuous distributions. Try the Metropolis algorithm, on the first exercise, so we are given $$\tilde{\pi}(\theta) = \theta^t (1-\theta)^{n-t}$$, using by letting $$Y$$ be normal with mean $$i$$, where we let $$Q(j|i)$$ be the pdf of a normal distribution centered at $$i$$ with variance $$1$$. Fix $$n=5$$, and $$t=2$$.

## 2.3 Exercise

How would you adapt the Metropolis-Hastings algorithm for the case of a bivariate continuous distribution? Try it in the case $$\tilde{\pi}(x,y)= e^{-x^2/2} e^{-y^2/2}$$â€”of course you know $$\pi$$, and a two dimensional uniform exploration.

### 2.3.1 Solution

To check reversibility, for $$i \not = j$$, we have

$\begin{eqnarray*} \pi_{i}P(j|i) &=& \pi_{i}Q(j|i)\alpha(j|i) \\ &=& \pi_{i}Q(j|i)\min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big). \end{eqnarray*}$

If $$\min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big)=\tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)}$$, then

$\begin{eqnarray*} \pi_{i}P(j|i) &=& \pi_{i}Q(j|i) \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \\ &=& \pi_j Q(j|i), \end{eqnarray*}$

and $$\min\big( \tfrac{\tilde{\pi}(i)}{\tilde{\pi}(j)} \cdot \tfrac{Q(j|i)}{Q(i|j)},1\big)=1$$, so that

$\begin{eqnarray*} \pi_{i}P(j|i) &=& \pi_j Q(j|i) \\ &=& \pi_j Q(j|i)\min\big( \tfrac{\tilde{\pi}(i)}{\tilde{\pi}(j)} \cdot \tfrac{Q(j|i)}{Q(i|j)},1\big) \\ &=& \pi_j P(j|i), \end{eqnarray*}$ as desired. A similar argument holds, if $$\min\big( \tfrac{\tilde{\pi}(j)}{\tilde{\pi}(i)} \cdot \tfrac{Q(i|j)}{Q(j|i)},1\big)=1$$.

### 2.3.2 Solution R code

h <- function(x){
z=0
if(x >0 && x <1 ) {z=(x^2) * ( (1-x)^3)}
z
}

Yq <- function(x){
y= rnorm(1,x,1)
y
}

a <- function(j,i){
m=min( c( h(j)/h(i), 1 ))
m
}

metro <- function(n){
x=0.5
for(k in 1:n){
i = x[length(x)]
j = Yq(i)
p = a(j,i)
x = c(x,i)
if ( rbinom(1,1,p)==1 ) {
x[length(x)] <- j
}
}
x
}

z = replicate(2000, metro(2000))

hist(z, prob=TRUE, breaks=50)
x= seq(0, 1, by =0.1)
curve(dbeta(x,3,4), add=TRUE)

### 2.3.3 Remarks

Actually, the above histogram, although does give the desired beta, is not what I intended to check. What I intended to do was to take the last entry of metro(2000) a large number of times, which would directly verify that distribution of the last entry of metro(2000) is almost the beta. However, I did get the right answer, why? We do the intended computation below.

z = replicate(2500, metro(2000)[length(metro(2000))])

hist(z, prob=TRUE, breaks=50)
x= seq(0, 1, by =0.1)
curve(dbeta(x,3,4), add=TRUE)

### 2.3.4 Solution (2d)

First, we look at how to do 3d histograms. (I havenâ€™t quite figured out how to easily get probability histograms yet.)

x = rnorm(5000)
y = rnorm(5000)
library(plot3D)
## Warning: package 'plot3D' was built under R version 4.0.5
x_c <- cut(x, 20)
y_c <- cut(y, 20)

z <- table(x_c, y_c)

hist3D(z=z, border="black")

h <- function(w){
x = w[1]
y = w[2]
z= exp(-x^2/2)*exp(-y^2/2)
z
}

Yq <- function(w){
x = w[1]
y = w[2]
yn =  runif(1,-0.3, 0.3) + y
xn = runif(1, -0.2, 0.2) +x
c(xn,yn)
}

a <- function(j,i){
m=min( c( h(j)/h(i), 1 ))
m
}

metro <- function(n){
x=0.5
y=0.5
for(k in 1:n){
i = c(x,y)
j = Yq(i)
p = a(j,i)
if (rbinom(1,1,p)==1) {
x = j[1]
y = j[2]
}
}
c(x,y)
}

w = replicate(5000, metro(500))

x = w[1,]
y = w[2,]

x_c <- cut(x, 20)
y_c <- cut(y, 20)

z <- table(x_c, y_c)

hist3D(z=z, border="black")