Sometimes in order to sample/simulate from a distribution \(\pi\), we cook up a Markov chain \(X\) so that \(\pi\) is the stationary distribution. There are many reasons why we may need/want to do this, since we may not have enough information about \(\pi\) to directly sample from it, but we may still be able to cook up a Markov chain where \(\pi\) is the stationary distribution. We run the Markov chain, and for large times \(n\), we know that the \(X_n\) will have law close to \(\pi\). We saw a version this procedure with the Gibbs sampler in Homework 3, Question 4
Coupling from the past is an exact method of sampling \(\pi\) introduced by Propp and Wilson. In some applications, theoretical and applied, exactness is important.
In order to motivate the coupling from the past, first we will discuss a naive variation of the Propp and Wilson algorithm that does not work. Consider a transition matrix \(P\) on a state space \(S=\{1, \ldots, N\}\). Consider the Markov chain as constructed in Making Markov chains converge, Section 2. Thus we have a function \(\phi:S \times [0,1] \to S\) such that if \(U\) is uniformly distributed in \([0,1]\) then \[\mathbb{P}(\phi(i, U) =j) = p_{ij}.\]
We simulate the Markov chain started at \(i\in S\), for \(m\) steps, by taking \(m\) independent uniforms \(U_0, \ldots, U_{m-1}\) to obtain:
\[ \begin{eqnarray*} && G_m(i)= \Big( X_0=i, X_1=\phi(i,U_0), X_2= \phi(X_1, U_1), \ldots X_m= \phi(X_{m-1}, U_{m-1}) \Big) \end{eqnarray*} \] In fact, using the same randomization \(U_0, \ldots, U_{m-1}\) we can see what path each \(i \in S\) takes. Thus, this gives a coupling of all these random paths. Let \(g_m(i) = X_m = G_m(i)[m]\) be the last value. Let \(T\) be the first time such that there exists \(K \in S\) such that \(g_T(1) = g_T(i)= K\) for all \(i \in S\). If we are so lucky that \(T < \infty\), then \(g_m\) will be a constant function for each \(m \geq T\).
With this coupling, at this time \(T\) and after, the starting point of the chain is in some sense forgotten, since regardless of the initial values, it ends up with the same values \(g_m(1)\) for \(m \geq T\). Although, in general \(g_T \not = g_{T+1}\).
We take \(K\) be the realized value for \(\pi\). Does this work? Answer no.
The following code will show that this procedure will in general fail. Consider the transition matrix \(P\) given by
P <- matrix(c(1/4, 1/4, 1/2, 0,0,
1/4, 1/4,0,0,1/2,
1/4,0,0, 1/2, 1/4,
0,0,0,1/2, 1/2, 1/5, 1/5, 1/5, 1/5, 1/5), nrow =5)
P <-t(P)
P
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.25 0.25 0.5 0.0 0.00
## [2,] 0.25 0.25 0.0 0.0 0.50
## [3,] 0.25 0.00 0.0 0.5 0.25
## [4,] 0.00 0.00 0.0 0.5 0.50
## [5,] 0.20 0.20 0.2 0.2 0.20
We want to advance the chain one step, using the same randomness. We modify the step function that appeared in Making Markov chains converage, Section 2:
stepcoupled <- function(zinput){
xoutput=c(-1, -1, -1, -1, -1)
sameu = runif(1)
for(i in 1:5){
j=0
cumq = cumsum(P[zinput[i],])
while(xoutput[i] ==-1){
j<-j+1
if(sameu <= cumq[j]){xoutput[i] <-j}
}
}
xoutput
}
Now we want to repeat this until the paths meet. First, we define a method to check whether all the entries of a vector are equal.
check <- function(x){
output= 1
a = x[1]
for (i in 2: length(x)){
if (x[i] != a){output <-0}
}
output
}
london <- function(){
x = c(1,2,3,4,5)
while(check(x)==0){
x <- stepcoupled(x)
}
x
}
It is not hard to verify that the output \(K\) will not in general have the stationary distribution. In fact, one can cook up Markov chains where \(K\) even omits values \(k\), where \(\pi(k) >0\).
Q <- P
for(i in 1:10000){
Q <- Q %*% P
}
Q
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [2,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [3,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [4,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
## [5,] 0.1704036 0.1345291 0.1434978 0.2600897 0.2914798
x = replicate(1000, london()[1])
mean(x==5)
## [1] 0.692
The general idea is if we run an Markov chain from time \(-\infty\) to time \(0\), then if it is aperiodic and irreducible, by time \(0\) it will have reached the stationary distribution. Consider now \[ \begin{eqnarray*} && G_m(i)= \Big( X_{-m}=i, X_{-m+1}=\phi(i,U_{-m}), X_{-m+2}= \phi(X_{-m+1}, U_{-m+1}), \ldots X_0= \phi(X_{-1}, U_{-1}) \Big) \end{eqnarray*} \]
Let \(g_m(i) = G_m(i)(m)= X_0\). We want to find the smallest \(M\) (hopefully finite) such that \(g_M\) is a constant function, for all starting values \(i\).
The subtle difference between the forward case and backward case is that \(g_{m} =g_M\) for all \(m \geq M\); this fact tells us that we need not go sample from \(-\infty\) to time \(0\) to achieve stationarity, we just need to go back to time \(-M\). Take \(K = g_M(1)=X_0\) to be realized value for the stationary distribution \(\pi\).
Again in order to simulate this, we must keep in mind this is a coupling, and we must use the same randomness each time. This is again a somewhat subtle matter, compared to other MCMC methods. This is the price we pay for the magic of exact sampling.
We will consider the same transition \(P\) and we modify the previous stepcoupled function to have greater control over the randomization.
stepcoupleu<- function(zinputu){
sameu = zinputu[6]
zinput = zinputu[-6]
xoutput=c(-1, -1, -1, -1, -1)
for(i in 1:5){
j=0
cumq = cumsum(P[zinput[i],])
while(xoutput[i] ==-1){
j<-j+1
if(sameu <= cumq[j]){xoutput[i] <-j}
}
}
xoutput
}
Here the rome function replaces the london function. Notice we need to keep track and re-use the uniform random variables generated.
rome <- function(){
x=c(1,2,3,4,5)
m=1
uhis=-99
while(check(x)==0){
x<-c(1,2,3,4,5)
m <- m+1
uhis <- c(uhis, runif(1))
for (i in 0:(m-2) ){
x <- stepcoupleu(c(x,uhis[m-i]))
}
}
x
}
rome()
## [1] 1 1 1 1 1
italy=( replicate(10000, rome()[1]) )
z=c(mean(italy==1),mean(italy==2),
mean(italy==3),mean(italy==4),mean(italy==5))
z
## [1] 0.1684 0.1306 0.1426 0.2596 0.2988
Compare with the previous forward result.