Making Markov chains converge

Constructing Markov chains

Theory: We have at our disposal an infinite sequence of independent random variables uniformly distributed on \([0,1]\).
R: We can simulate as many independent uniform random variables as we please.
Any random variable can be thought of as a function of uniform random variable.

Continued

Let \(X\) be the Markov chain on the state space \(S\) with transition matrix \(P\).
First we construct \(X_0 = f(U_{-1})\)
Next, let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]
The function \(\phi\) is noting abstract at all, in R, it is simply the procedure you would define so that given \(i\), it outputs \(j\) with probability \(p_{ij}\); here the \(U\) represents the randomization that R uses.

continued

Then the Markov chain \(X\) can be constructed as

\(X_0 = f(U_{-1})\)
\(X_1 = \phi(X_0, U_0)\)
\(X_2 = \phi(X_1, U_1)\)
\(X_{n+1} = \phi(X_{n}, U_n)\)

Example code

P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,
    0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
P

##           [,1]      [,2] [,3] [,4]      [,5]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000
## [2,] 0.5000000 0.0000000  0.0  0.0 0.5000000
## [3,] 0.5000000 0.0000000  0.0  0.5 0.0000000
## [4,] 0.0000000 0.0000000  0.0  0.5 0.5000000
## [5,] 0.3333333 0.3333333  0.0  0.0 0.3333333

continued

initial <- function(){
u = runif(1)
x=-1
k=0
while(x==-1){
k<-k+1
  if( u <=0.2*k){x<-k}
}  
x
}

continued

step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

continued

steps <- function(n){
  x = initial()
  for (i in 1:n){
    x <- c(x, step(x[i]))
  }
  x
}

steps(100)

##   [1] 1 3 1 3 4 5 5 5 1 3 1 3 4 4 5 2 5 2 5 1 3 1
##  [23] 2 5 1 2 5 1 3 4 4 4 5 5 2 5 2 5 5 2 5 1 3 1
##  [45] 3 1 2 5 5 2 1 3 4 5 1 2 1 2 5 1 2 1 3 4 4 5
##  [67] 5 1 2 1 2 1 2 1 3 1 2 5 2 1 3 1 3 4 5 5 5 5
##  [89] 1 2 5 5 2 1 3 4 5 1 3 4 4

Aperiodic and irreducible Markov chains

Let \(P\) be a transition matrix on the state space \(S\).
The period of a state \(i\in S\) is defined to be \[per(i) = \gcd\{n \geq 1: p_{ii}(n) >0\}.\]
We say that \(P\) is aperiodic if every state \(i\) is such that \(per(i) = 1\).

Lemma on aperiodic transition matrices

If \(P\) is aperiodic transition matrix on a finite state space \(S\), then there exists \(M>0\) such that \(p_{ii}(n) >0\) for all \(i\in S\) and all \(n \geq M\).

Number theory lemma

Let \(a_i\) be positive integers. Let \(d = \gcd\{a_1, a_2, \ldots\}\).

There exists a finite \(n\) such that \(d= \gcd\{a_1, \ldots, a_n\}\).
Recall that if \(\gcd\{a_1, \ldots, a_n\}=1\), then there exist integers \(x_1, \ldots, x_n\) (possibly negative) such that \(a_1x_1 + \cdots +a_nx_n = 1.\)
See Die Hard 3, Water riddle
In fact, there exists \(M\) such that for all \(m \geq M\), there exist nonnegative integers \(x_1, \ldots, x_n\) such that \(a_1x_1 + \cdots +a_nx_n = m.\)

Proof of Lemma about aperiodic transition matrices

Assume \(S = \{1, \ldots, N\}\).
Let \(A^i = \{n \geq 1: p_{ii}(n) >0\}\).
Observe that if \(a, b \in A^i\), then \(a+b \in A^i\). Why?
Thus by elementary number theory, there exists \(M^i\) such that \(p_{ii}(m) >0\) for all \(m \geq M^i\).
Choose \[M= \max\{M^1, \ldots, M^N\}.\]

Irreducibility and aperiodicity

We say that \(P\) is irreducible if for every \(i,j \in S\) there exists \(n\) such that \(p_{ij}(n)>0\).
Let \(P\) be a transition matrix on a finite state space \(S\).
If \(P\) is aperiodic and irreducible, then there exists \(N\) such that for all \(n \geq N\), we have \(p_{ij}(n) >0\) for all \(i,j \in S\).

Proof

Since \(P\) is aperiodic, we know that for \(M\) sufficiently large, we can be sure that for all \(i \in S\) we have \(p_{ii}(n) >0\) for all \(n \geq M\).
Observe that \[ p_{ij}(n+k) \geq p_{ii}(n)p_{ij}(k).\]
The irreducibility assumptions assures us that for every pair \((i,j)\) we can find \(k=k_{ij}\) such that \(p_{ij}(k) >0\).

continued

Thus \[p_{ij}(n) >0 \text{ for all } n \geq M + k_{ij};\] take \[K =\max \{ k_{ij}: i,j \in S\}\] then for all \(i,j \in S\), we have \[p_{ij}(n) >0\] for all \(n \geq M+K\).

Doeblin’s coupling

Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\).
Let \(X\) and \(Y\) be Markov chains with the same transition matrix \(P\), but with different initial distributions.
We are interested in long term behaviour of \(d_{TV}(X_n, Y_n).\)
By the coupling inequality, we know that \[d_{TV}(X_n, Y_n) \leq 2 \mathbb{P}(X_n' \not = 'Y_n)\] for any coupling for \(X_n\) and \(Y_n\).
Thus we should look for a coupling \((X', Y')\) of \(X\) and \(Y\), where \(X_n'\) is eventually equal to \(Y_n'\).

DC continued

Let \(X'\) and \(Z'\) be independent Markov chains with starting distributions of \(X\) and \(Y\) respectively.
Let \(T\) be the first time such that \(X'_T = Z'_T\).
For all \(n \leq T\), set \(Y'_n = Z'_n\).
For \(n > T\), set \(Y'_n = X'_n\).

Thus \[d_{TV}(X_n, Y_n) = d_{TV}(X'_n, Y'_n) \leq 2\mathbb{P}(T >n).\] Sometimes \((X_n', Y_n')\) is called Doeblin’s coupling or the classical coupling; the coupling is said to be successful at the random time \(T\).

DC Constructed

Another way to understand Doeblin’s coupling is to actually construct the Markov chains.

Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]
Let \(U_{-1}, U_0, U_1, \ldots, V_{-1}, V_0, V_1, \ldots\) be independent random variables that are uniformly distributed in \([0,1]\).

continued

Let \(X'_0 = f(U_{-1})\) have the same law as \(X_0\) and \(X_n' = \phi(X_{n-1}', U_{n-1})\).
Let \(Z_0' = g(V_{-1})\) have the same law as \(Y_0\) and \(Z_n' = \phi(Z_{n-1}', V_{n-1})\)
Thus \[ X'=\big(X_0', X_1', X_2', \dots\big)\] and \[Z'=\big(Z_0', Z_1', Z_2', \dots\big)\] have the same laws as \(X\) and \(Y\), respectively

continued

Notice that \[ \begin{eqnarray*} && \big(Z_0', \phi(Z_0', V_0), \phi(Z_1', V_1), \phi\big(\phi(Z_1', V_1), U_2\big)\big) \\ && \text{ has the same law as } (Y_0, Y_1, Y_2, Y_3). \end{eqnarray*} \]
Let \(T\) be the first time that \(X'_T = Z'_T\). We have that

\[ \begin{eqnarray*} && Y'= \\ && \Big(Z_0', \phi(Z_0',V_0), \phi(Z_1', V_1), \phi(Z_2', V_2), \\ && \dots, \phi(Z_T', V_T) = \phi(X_T', U_T), \phi(X'_{T+1}, U_{T+1}), \ldots \Big) \end{eqnarray*} \] has the same law as \(Y\).

Simulations

The following simulation considers the transition matrix considered earlier. We will consider the Doeblin coupling of two chains started at \(1\) and \(5\).

P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 
              0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)

continued

step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

continued

stepsind<- function(){
  x=1
  y=5
  while(x[length(x)] != y[length(y)] ){
    x<- c(x, step(x[length(x)])) 
    y<- c(y, step(y[length(y)]))
  }
 rbind(x,y)
}

continued

coupled <- function(z){
  tog = z[1,]
  tog <- tog[length(tog)]
  for (i in 1:5){
    tog <- c(tog, step(tog[length(tog)]))
  }
  x = c(z[1,], tog)
  y = c(z[2,], tog)
  rbind(x,y)
}    
coupled(stepsind())

##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x    1    2    5    2    2    5    1    3    1     3
## y    5    5    1    2    2    5    1    3    1     3

Bounds

We need to show that \(\mathbb{P}( T >n) \to 0\) as \(n \to \infty\).
Let \(M\) be so that \(P^M >0\).
Let \(\varepsilon = \min_{i,j \in S} p_{ij}(M)>0\). Since \(X'\) and \(Z'\) are independent, we have
\[ \begin{eqnarray*} \mathbb{P}(X'_M = Z'_M) &=& \sum_{j} \mathbb{P}(X_M' = j)\mathbb{P}(Z_M' = j) \\ &\geq& \varepsilon \sum_{j} \mathbb{P}(X_M' = j)= \varepsilon. \end{eqnarray*} \]
Let \(k\) be an integer. Then we have that
\[\mathbb{P}(T > kM) \leq (1- \varepsilon)^k.\]

Doeblin’s classical coupling

If \(X\) and \(Y\) are Markov chains with the same aperiodic irreducible transition matrix on a finite state space \(S\), then there exists \(M >0\) and \(\varepsilon >0\) such that \[d_{TV}(X_n,Y_n) \leq 2 (1- \varepsilon)^{ \lfloor n/M \rfloor}.\] Here \(M\) and \(\epsilon\) depends only on \(P\).

Max/min lemma

The following simple fact will allows us to harness Doeblin’s coupling further.

Let \(P\) be transition matrix on a finite state space \(S\). The sequence given by \[a_j(n) = \min_{i \in S} p_{ij}(n)\] is nondecreasing, and the sequence given by \[b_j(n) = \max_{i \in S} p_{ij}(n)\] is nonincreasing.

Proof

We give the proof for the nondecreasing case. Observe that \[ \begin{eqnarray*} a_j(n+1) &=& \min_{i \in S} \sum_k p_{ik}(1) p_{kj}(n) \\ &\geq& \min_{i \in S} \sum_k p_{ik}(1) a_j(n) \\ &=& a_j(n). \end{eqnarray*} \]

Stationary distribution

Let \(P\) be a transition matrix on a state space \(S\). Recall an initial distribution \(\lambda\) is stationary if \(\lambda P = \lambda\).

Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Then there exists a unique stationary distribution \(\lambda\) for \(P\); in particular:

\(\lambda_j = \lim_{n \to \infty} \min_{i \in S} p_{ij}(n)\)
If \(X\) is any Markov chain with transition matrix \(P\), then \(\mathbb{P}(X_n=j) \to \lambda_j\).
For all \(i,j \in S\), we have \(p_{ij}(n) \to \lambda_j\).

Proof (uniquness)

Let \(\lambda\) and \(\mu\) be stationary.
Then \(\mu P^n = \mu\) and \(\lambda P^n = \lambda\) for all \(n\).
By Doeblin’s theorem, we can conclude that if \(X\) and \(Y\) are Markov chains with transition matrix \(P\) and initial distributions \(\lambda\) and \(\mu\), respectively, we have \(X_n\) has the same law as \(X_0\) and \(Y_n\) has the same law as \(Y_0\), and
\[ d_{TV}(X_0, Y_0) = d_{TV}(X_n, Y_n) \to 0.\] Hence \(X_0\) has the same law as \(Y_0\) and \(\lambda = \mu\).

Proof (limits)

By the max/min lemma we may define \[\lambda_j =\lim_{n \to \infty} a_j(n).\]
We will show that

\[\mathbb{P}(X_n=j) \to \lambda_j.\]

Let \(\varepsilon >0\). By Doeblin’s theorem, there exists \(N_1>0\), so that for any Markov chains \(X\) and \(Y\) with transition matrix \(P\), we have
\[d_{TV}(X_n,Y_n)< \varepsilon/2\] for \(n \geq N_1\).

continued

Choose \(N_2\) so that \[|\lambda_j -a_j(n)| < \varepsilon/2\] for all \(n \geq N_2\).
Let \(m \geq \max\{N_1, N_2\}\).
Let \(i \in S\) be chosen so that \(p_{ij}(m)=a_j(m) = \min_{i \in S} p_{ij}(m)\).

continued

Let \(Y\) be a Markov chain with transition matrix \(P\) that is started at \(i\).
Then \[|\mathbb{P}(X_m=j) - \mathbb{P}(Y_m=j)| = |\mathbb{P}(X_m=j) - a_j(m)|,\] so that \[|\mathbb{P}(X_m=j) - \lambda_j| < \varepsilon.\]
The last claims follows by considering a Markov chain started at \(i\).

Proof (stationarity)

It remains to verify that \(\lambda\) is a stationary measure.
We have \[\sum_{j \in S} \lambda_j = \lim_{n \to \infty} \sum_{j \in S} \mathbb{P}(X_n=j) =1.\]
To verify stationarity, we note that \(p_{ji}(n) \to \lambda_i\).
Hence, \[ \begin{eqnarray*} (\lambda P)_j = \sum_{i \in S} \lambda_i p_{ij} &=& \lim_{n \to \infty} \sum_{i \in S} p_{ji}(n)p_{ij} \\ &=& \lim_{n \to \infty} p_{jj}(n+1) = \lambda_j \end{eqnarray*} \]

Baby ergodic theorem

Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\) so that it has a stationary distribution \(\pi\). Let \(X\) be a Markov chain with transition matrix \(P\). Fix a state \(s\in S\), and set

\[Y_n = \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k = s].\] Then \(Y_n\) converges to \(\pi(s)\) in the mean-squared, and consequently in probability.

This version of the law of large numbers is slightly harder to prove than the weak one that we proved for the i.i.d. case.

Summary

Constructed Markov chains
Defined irreducibity and aperiodicity
Defined Doeblin’s coupling
Used Doeblin’s coupling to proof many basic facts for these Markov chains:
- Existence and uniqueness of the stationary distribution
- convergence in total variation

Version: 26 October 2020

Rmd Source