Coupling from the past

Sampling from the stationary distribution of a Markov chain

Sometimes in order to sample/simulate from a distribution \(\pi\) we cook up a Markov chain \(X\) so that \(\pi\) is the stationary distribution.
Then we run the Markov chain, and for large times \(n\), we know that the \(X_n\) will have law close to \(\pi\).
We saw this with the Gibbs sampler in Homework 3, Question 4
Coupling from the past is an exact method of sampling \(\pi\) introduced by Propp and Wilson.

All trains go to London?

Consider a transition matrix \(P\) on a state space \(S=\{1, \ldots, N\}\).
Consider the Markov chain as constructed in Making Markov chains converge, Section 2.
Thus we have a function \(\phi:S \times [0,1] \to S\) such that if \(U\) is uniformly distributed in \([0,1]\) then \[\mathbb{P}(\phi(i, U) =j) = p_{ij}\]

continued

We simulate the Markov chain started at \(i\in S\), for \(m\) steps, by taking \(m\) independent uniforms \(U_0, \ldots, U_{m-1}\) to obtain:

\[ \begin{eqnarray*} && G_m(i)= \Big( X_0=i, X_1=\phi(i,U_0), X_2= \phi(X_1, U_1) \\ &&, \ldots X_m= \phi(X_{m-1}, U_{m-1}) \Big) \end{eqnarray*} \]

In fact, using the same randomization \(U_0, \ldots, U_{m-1}\) we can see what path each \(i \in S\) takes.
Thus, this gives a coupling of all these random paths. Let \(g_m(i) = X_m = G_m(i)[m]\) be the last value.

continued

Let \(T\) be the first time such that there exists \(K \in S\) such that \(g_T(1) = g_T(i)= K\) for all \(i \in S\).
If we are so lucky that \(T < \infty\), then \(g_m\) will be a constant function for each \(m \geq T\).
With this coupling, at this time \(T\) and after, the starting point of the chain is in some sense forgotten, since regardless of the initial values, it ends up with the same values \(g_m(1)\) for \(m \geq T\). Although, in general \(g_T \not = g_{T+1}\).
We take \(K\) be the realized value for \(\pi\).
Does this work? Answer no.

Code

Consider the transition matrix \(P\) given by

P <- matrix(c(1/4, 1/4, 1/2, 0,0, 
              1/4, 1/4,0,0,1/2, 
              1/4,0,0, 1/2, 1/4, 
              0,0,0,1/2, 1/2, 1/5, 1/5, 1/5, 1/5, 1/5), nrow =5)
P <-t(P)
P

##      [,1] [,2] [,3] [,4] [,5]
## [1,] 0.25 0.25  0.5  0.0 0.00
## [2,] 0.25 0.25  0.0  0.0 0.50
## [3,] 0.25 0.00  0.0  0.5 0.25
## [4,] 0.00 0.00  0.0  0.5 0.50
## [5,] 0.20 0.20  0.2  0.2 0.20

continued

We want to advance the chain one step, using the same randomness. We modify the step function that appeared in Making Markov chains converage, Section 2:

stepcoupled <- function(zinput){
  xoutput=c(-1, -1, -1, -1, -1)
  sameu = runif(1)
  for(i in 1:5){
    j=0
  cumq = cumsum(P[zinput[i],])
  while(xoutput[i] ==-1){
    j<-j+1
    if(sameu <= cumq[j]){xoutput[i] <-j}
  }
  }
  xoutput
}

continued

Now we want to repeat this until the paths meet. First, we define a method to check whether all the entries of a vector are equal.

check <- function(x){
output= 1
a = x[1]
for (i in 2: length(x)){
  if (x[i] != a){output <-0}
}
output
}

continued

london <- function(){
  x = c(1,2,3,4,5)
  while(check(x)==0){
  x <- stepcoupled(x)
}
x
}

It is not hard to verify that the output \(K\) will not in general have the stationary distribution.
In fact, one can cook up Markov chains where \(K\) even omits values \(k\), where \(\pi(k) >0\).

continued

Q <- P
for(i in 1:10000){
  Q <- Q %*% P
}
Q

##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [2,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [3,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [4,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [5,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798

x = replicate(1000, london()[1])
mean(x==5)

## [1] 0.734

What about going backwards! All roads lead to Rome

General idea is if we run an Markov chain from time \(-\infty\) to time \(0\), then if it is aperiodic and irreducible, by time \(0\) it will have reached the stationary distribution.
Consider now \[ \begin{eqnarray*} && G_m(i)= \Big( X_{-m}=i, X_{-m+1}=\phi(i,U_{-m}), \\ && X_{-m+2}= \phi(X_{-m+1}, U_{-m+1})\\ &&, \ldots X_0= \phi(X_{-1}, U_{-1}) \Big) \end{eqnarray*} \]
Let \(g_m(i) = G_m(i)(m)= X_0\). We want to find the smallest \(M\) (hopefully finite) such that \(g_M\) is a constant function.

continued

The subtle difference between the forward case and backward case is that \(g_{m} =g_M\) for all \(m \geq M\).
Take \(K = g_M(1)=X_0\) to be realized value for the stationary distribution \(\pi\).
Again in order to simulate this, we must keep in mind this is a coupling, and we must use the same randomness each time. This is again a somewhat subtle matter, compared to other MCMC methods.
This is the price we pay for the magic of exact sampling.

Code

We will consider the same transition \(P\) and we modify the previous stepcoupled function to have greater control over the randomization.

stepcoupleu<- function(zinputu){
  sameu = zinputu[6]
  zinput = zinputu[-6]
  xoutput=c(-1, -1, -1, -1, -1)
  for(i in 1:5){
    j=0
  cumq = cumsum(P[zinput[i],])
  while(xoutput[i] ==-1){
    j<-j+1
    if(sameu <= cumq[j]){xoutput[i] <-j}
  }
  }
  xoutput
}

continuted

Here the rome function replaces the london function. Notice we need to keep track and re-use the uniform random variables generated.

rome <- function(){
x=c(1,2,3,4,5)
m=1
uhis=-99
while(check(x)==0){
  x<-c(1,2,3,4,5)
  m <- m+1
  uhis <- c(uhis, runif(1))
  for (i in 0:(m-2) ){
    x <- stepcoupleu(c(x,uhis[m-i]))
  }
}
x
}

continued

rome()

## [1] 3 3 3 3 3

italy=( replicate(10000, rome()[1]) )

z=c(mean(italy==1),mean(italy==2),
    mean(italy==3),mean(italy==4),mean(italy==5))
z

## [1] 0.1715 0.1336 0.1457 0.2621 0.2871

Compare with page 10!

Endnotes

Version: 20 April 2024
Rmd Source