1 Abstract

We will discuss Doeblin’s classical successful coupling of irreducible and aperiodic Markov chains on a finite state space.

2 Constructing Markov chains

From the perspective of theory, we already assume that there exists a random variable that is uniformly distributed on \([0,1]\) and thus we have at our disposal an infinite sequence of independent random variables \(U_{-1}, U_0, U_1, \ldots\) all uniformly distributed on \([0,1]\). From the perspective of R, this just means that we can simulate as many independent uniform random variables as we please. Also remember that any random variable can be thought of as a function of uniform random variable.

Let \(X\) be the Markov chain on the state space \(S\) with transition matrix \(P\) that we wish to construct or simulate. First we construct \(X_0 = f(U_{-1})\) Next, Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]

Thus \(\phi\) is noting abstract at all, in R, it is simply the procedure you would define so that given \(i\), it outputs \(j\) with probability \(p_{ij}\); here the \(U\) represents the randomization that R uses.

Then the Markov chain \(X\) can be constructed as

\(X_0 = f(U_{-1})\)
\(X_1 = \phi(X_0, U_0)\)
\(X_2 = \phi(X_1, U_1)\)
\(X_{n+1} = \phi(X_{n}, U_n).\)

Sometimes such a construction is called a random function representation of a Markov chain, see also Propp and Wilson (1996).

We consider the following example of this theory in R. We will code everything in terms of uniform random variable, even though it is not necessary in R.

initial <- function(){
u = runif(1)
x=-1
k=0
while(x==-1){
k<-k+1
  if( u <=0.2*k){x<-k}
}  
x
}

P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
P

##           [,1]      [,2] [,3] [,4]      [,5]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000
## [2,] 0.5000000 0.0000000  0.0  0.0 0.5000000
## [3,] 0.5000000 0.0000000  0.0  0.5 0.0000000
## [4,] 0.0000000 0.0000000  0.0  0.5 0.5000000
## [5,] 0.3333333 0.3333333  0.0  0.0 0.3333333

step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

steps <- function(n){
  x = initial()
  for (i in 1:n){
    x <- c(x, step(x[i]))
  }
  x
}

steps(100)

##   [1] 5 1 2 5 2 5 1 2 5 2 1 2 5 1 2 5 1 2 5 1 3 4
##  [23] 5 1 3 4 4 4 4 4 5 2 5 5 5 5 2 1 3 4 5 2 5 1
##  [45] 3 1 3 4 4 4 5 5 2 5 1 2 5 2 1 3 4 5 5 1 3 4
##  [67] 5 5 2 1 3 1 3 1 2 5 1 2 1 2 5 5 5 2 1 3 4 5
##  [89] 1 2 5 1 2 1 3 4 4 4 4 4 4

3 Aperiodic and irreducible Markov chains

Let \(P\) be a transition matrix on the state space \(S\). The period of a state \(i\in S\) is defined to be \(per(i) = \gcd\{n \geq 1: p_{ii}(n) >0\}\); here the greatest common divisor of \(24\) and \(56\) is \(8\). We say that \(P\) is aperiodic if every state \(i\) is such that \(per(i) = 1\).

Lemma 3.1 (Aperiodic transition matrices) If \(P\) is aperiodic transition matrix on a finite state space \(S\), then there exists \(M>0\) such that \(p_{ii}(n) >0\) for all \(i\in S\) and all \(n \geq M\).

The proof requires a bit of elementary number theory.

Lemma 3.2 Let \(a_i\) be positive integers. Let \(d = \gcd\{a_1, a_2, \ldots\}\).

There exists a finite \(n\) such that \(d= \gcd\{a_1, \ldots, a_n\}\).
Recall that if \(\gcd\{a_1, \ldots, a_n\}=1\), then there exist integers \(x_1, \ldots, x_n\) (possibly negative) such that \(a_1x_1 + \cdots +a_nx_n = 1.\)
See Die Hard 3, Water riddle
In fact, there exists \(M\) such that for all \(m \geq M\), there exist nonnegative integers \(x_1, \ldots, x_n\) such that \(a_1x_1 + \cdots +a_nx_n = m.\)

Proof (of Lemma about aperiodic transition matrices). Assume \(S = \{1, \ldots, N\}\). Let \(A^i = \{n \geq 1: p_{ii}(n) >0\}\). Observe that if \(a, b \in A^i\), then \(a+b \in A^i\). Thus by elementary number theory, there exists \(M^i\) such that \(p_{ii}(m) >0\) for all \(m \geq M^i\). Choose \[M= \max\{M^1, \ldots, M^N\}.\]

We say that \(P\) is irreducible if for every \(i,j \in S\) there exists \(n\) such that \(p_{ij}(n)>0\).

Lemma 3.3 (Aperiodic and irreducible matrices) Let \(P\) be a transition matrix on a finite state space \(S\). If \(P\) is aperiodic and irreducible, then there exists \(N\) such that for all \(n \geq N\), we have \(p_{ij}(n) >0\) for all \(i,j \in S\).

Proof. Since \(P\) is aperiodic, we know that for \(M\) sufficiently large, we can be sure that for all \(i \in S\) we have \(p_{ii}(n) >0\) for all \(n \geq M\). Observe that \[ p_{ij}(n+k) \geq p_{ii}(n)p_{ij}(k).\] The irreducibility assumptions assures us that for every pair \((i,j)\) we can find \(k=k_{ij}\) such that \(p_{ij}(k) >0\). Thus \[p_{ij}(n) >0 \text{ for all } n \geq M + k_{ij};\] take \[K =\max \{ k_{ij}: i,j \in S\}\] then \(p_{ij}(n) >0\) for all \(n \geq M+K\).

4 Doeblin’s coupling

4.1 The coupling

Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Let \(X\) and \(Y\) be Markov chains with the same transition matrix \(P\), but with different initial distributions. We are interested in long term behaviour of \(d_{TV}(X_n, Y_n).\) By the coupling inequality, we know that \[d_{TV}(X_n, Y_n) \leq 2 \mathbb{P}(X_n' \not = 'Y_n)\] for any coupling for \(X_n\) and \(Y_n\).

Thus we should look for a coupling \((X', Y')\) of \(X\) and \(Y\), where \(X_n'\) is eventually equal to \(Y_n'\). Let \(X'\) and \(Z'\) be independent Markov chains with starting distributions of \(X\) and \(Y\) respectively. Let \(T\) be the first time such that \(X'_T = Z'_T\). For all \(n \leq T\), set \(Y'_n = Z'_n\). For \(n > T\), set \(Y'_n = X'_n\). Thus \[d_{TV}(X_n, Y_n) = d_{TV}(X'_n, Y'_n) \leq 2\mathbb{P}(T >n).\] Sometimes \((X_n', Y_n')\) is called Doeblin’s coupling or the classical coupling; the coupling is said to be successful at the random time \(T\).

4.2 Constructed

Another way to understand Doeblin’s coupling is to actually construct the Markov chains.
Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]
Let \(U_{-1}, U_0, U_1, \ldots, V_{-1}, V_0, V_1, \ldots\) be independent random variables that are uniformly distributed in \([0,1]\). Let \(X'_0 = f(U_{-1})\) have the same law as \(X_0\) and \(X_n' = \phi(X_{n-1}', U_{n-1})\). Similarly, let \(Z_0'= g(V_{-1})\) have the same law as \(Y_0\) and \(Z_n' = \phi(Z_{n-1}', V_{n-1})\) Thus \[ X'=\big(X_0', X_1', X_2', \dots\big)\] and \[Z'=\big(Z_0', Z_1', Z_2', \dots\big)\] have the same laws as \(X\) and \(Y\), respectively Notice that \[\big(Z_0', \phi(Z_0', V_0), \phi(Z_1', V_1), \phi\big(\phi(Z_1', V_1), U_2\big)\big) \text{ has the same law as } (Y_0, Y_1, Y_2, Y_3);\] it does not matter if we choose to use the \(V_i\)’s or the \(U_i\)’s. Let \(s \in S\). We can say choose to use the \(U_i'\) whenever we see that \(Y_i'=s\), and use the \(V_i\)’s otherwise.

Let \(T\) be the first time that \(X'_T = Z'_T\). We have that \[Y'= \Big(Z_0', \phi(Z_0',V_0), \phi(Z_1', V_1), \phi(Z_2', V_2), \dots, \phi(Z_T', V_T) = \phi(X_T', U_T), \phi(X'_{T+1}, U_{T+1}), \ldots \Big)\] has the same law as \(Y\).

The following simulation considers the transition matrix considered earlier. We will consider the Doeblin coupling of two chains started at \(1\) and \(5\).

P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)

step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

stepsind<- function(){
  x=1
  y=5
  while(x[length(x)] != y[length(y)] ){
    x<- c(x, step(x[length(x)])) 
    y<- c(y, step(y[length(y)]))
  }
 rbind(x,y)
}

    
coupled <- function(z){
  tog = z[1,]
  tog <- tog[length(tog)]
  for (i in 1:5){
    tog <- c(tog, step(tog[length(tog)]))
  }
  x = c(z[1,], tog)
  y = c(z[2,], tog)
  rbind(x,y)
}    
coupled(stepsind())

##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## x    1    2    1    2    1    2    5    2    1     2     5     5     1     3
## y    5    1    3    4    4    4    4    4    4     4     5     5     1     3
##   [,15] [,16] [,17]
## x     4     4     4
## y     4     4     4

4.3 Bounds

We need to show that \(\mathbb{P}( T >n) \to 0\) as \(n \to \infty\). Let \(M\) be so that \(P^M >0\). Let \(\varepsilon = \min_{i,j \in S} p_{ij}(M)>0\). Since \(X'\) and \(Z'\) are independent, we have \[\mathbb{P}(X'_M = Z'_M) = \sum_{j} \mathbb{P}(X_M' = j)\mathbb{P}(Z_M' = j) \geq \varepsilon \sum_{j} \mathbb{P}(X_M' = j)= \varepsilon.\]

Let \(k\) be an integer. Then we have that
\[\mathbb{P}(T > kM) \leq (1- \varepsilon)^k.\]

Theorem 4.1 (Doeblin's classical coupling) If \(X\) and \(Y\) are Markov chains with the same aperiodic irreducible transition matrix on a finite state space \(S\), then there exists \(M >0\) and \(\varepsilon >0\) such that \[d_{TV}(X_n,Y_n) \leq 2 (1- \varepsilon)^{ \lfloor n/M \rfloor}.\] Here \(M\) and \(\epsilon\) depends only on \(P\).

5 Stationarity

The following simple fact will allows us to harness Doeblin’s coupling further.

Lemma 5.1 (max/min) Let \(P\) be transition matrix on a finite state space \(S\). The sequence given by \[a_j(n) = \min_{i \in S} p_{ij}(n)\] is nondecreasing, and the sequence given by \[b_j(n) = \max_{i \in S} p_{ij}(n)\] is nonincreasing.

Proof. We give the proof for the nondecreasing case. Observe that \[\begin{eqnarray*} a_j(n+1) &=& \min_{i \in S} \sum_k p_{ik}(1) p_{kj}(n) \\ &\geq& \min_{i \in S} \sum_k p_{ik}(1) a_j(n) \\ &=& a_j(n). \end{eqnarray*}\]

Let \(P\) be a transition matrix on a state space \(S\). Recall an initial distribution \(\lambda\) is stationary if \(\lambda P = \lambda\).

Theorem 5.1 (Stationary distributions for finite state Markov chains) Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Then there exists a unique stationary distribution \(\lambda\) for \(P\); in particular:

\(\lambda_j = \lim_{n \to \infty} \min_{i \in S} p_{ij}(n)\)
If \(X\) is any Markov chain with transition matrix \(P\), then \(\mathbb{P}(X_n=j) \to \lambda_j\).
For all \(i,j \in S\), we have \(p_{ij}(n) \to \lambda_j\).

Proof. If both \(\lambda\) and \(\mu\) are stationary, then \(\mu P^n = \mu\) and \(\lambda P^n = \lambda\) for all \(n\), and by Doeblin’s theorem, we can conclude that if \(X\) and \(Y\) are Markov chains with transition matrix \(P\) and initial distributions \(\lambda\) and \(\mu\), respectively, we have \(X_n\) has the same law as \(X_0\) and \(Y_n\) has the same law as \(Y_0\), and
\[ d_{TV}(X_0, Y_0) = d_{TV}(X_n, Y_n) \to 0.\] Hence \(X_0\) has the same law as \(Y_0\) and \(\lambda = \mu\).

By the max/min lemma we may define \[\lambda_j =\lim_{n \to \infty} a_j(n).\]

We will show that

\[\mathbb{P}(X_n=j) \to \lambda_j.\]

Let \(\varepsilon >0\). By Doeblin’s theorem, there exists \(N_1>0\), so that for any Markov chains \(X\) and \(Y\) with transition matrix \(P\), we have
\[d_{TV}(X_n,Y_n)< \varepsilon/2\] for \(n \geq N_1\). Choose \(N_2\) so that \[|\lambda_j -a_j(n)| < \varepsilon/2\] for all \(n \geq N_2\).
Let \(m \geq \max\{N_1, N_2\}\). Let \(i \in S\) be chosen so that \(p_{ij}(m)=a_j(m) = \min_{i \in S} p_{ij}(m)\). Let \(Y\) be a Markov chain started at \(i\). Then \[|\mathbb{P}(X_m=j) - \mathbb{P}(Y_m=j)| = |\mathbb{P}(X_m=j) - a_j(m)|,\] so that \[|\mathbb{P}(X_m=j) - \lambda_j| < \varepsilon.\]
Thus \(\mathbb{P}(X_n = j) \to \lambda_j\).

The last claims follows by considering a Markov chain started at \(i\).

It remains to verify that \(\lambda\) is a stationary measure. We have \[\sum_{j \in S} \lambda_j = \lim_{n \to \infty} \sum_{j \in S} \mathbb{P}(X_n=j) =1.\]

To verify stationarity, we note that \(p_{ji}(n) \to \lambda_i\). Hence, \[(\lambda P)_j = \sum_{i \in S} \lambda_i p_{ij} = \lim_{n \to \infty} \sum_{i \in S} p_{ji}(n)p_{ij} = \lim_{n \to \infty} p_{jj}(n+1) = \lambda_j\]

6 Baby ergodic theorem

Theorem 6.1 Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\) so that it has a stationary distribution \(\pi\). Let \(X\) be a Markov chain with transition matrix \(P\). Fix a state \(s\in S\), and set

\[Y_n = \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k = s].\] Then \(Y_n\) converges to \(\pi(s)\) in the mean-squared, and consequently in probability.

This version of the law of large numbers is slightly harder to prove than the weak one that we proved for the i.i.d. case.

Proof. We will consider the special case where \(X\) is started at the stationary dsitribution.

We will make use of the following fact, which follows from Doeblin’s theorem and summing a geometric series: \[\sum_{\ell = 1} ^{n-1} \sum_{k= 0} ^{\ell-1} \big( p_{ss}(\ell - k) - \pi(s) \big) \leq Cn,\] for some fixed constant \(C >0\).

Now some algebra gives that \[\begin{eqnarray*} (Y_n - \pi(s) )^2 &=& \frac{1}{n^2} \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 -\frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k=s] \ + \ \pi(s)^2, \end{eqnarray*}\] and \[ \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 = \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] + 2\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \mathbf{1}[X_k=s]\mathbf{1}[X_{\ell}=s].\] Thus \[ \mathbb{E} \Big( \frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k=s] \Big) = \frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \pi(s) = 2\pi(s)^2\] \[\begin{eqnarray*} \mathbb{E} \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 &=& \sum_{k=0} ^{n-1} \pi(s) + 2\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \mathbb{P}(X_{\ell} =s \ | \ X_{k}=s) \mathbb{P}(X_k =s) \\ &=& n\pi(s) + 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} p_{ss}(\ell-k). \end{eqnarray*}\] Also note that the double sum contains \(n(n-1)/2\) terms, so \[\begin{eqnarray*} n^2\mathbb{E} (Y_n - \pi(s) )^2 &=& n\pi(s) +\big(n\pi(s)^2 - n\pi(s)^2\big) - n^2\pi(s) + 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} p_{ss}(\ell-k) \\ &=& n\big(\pi(s) - \pi(s)^2\big) \ + \ 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \big( p_{ss}(\ell-k) - \pi(s) \big). \end{eqnarray*}\] Hence the result follows from our first assertion.

7 Summary

Constructed Markov chains
Defined irreducibity and aperiodicity
Defined Doeblin’s coupling
Used Doeblin’s coupling to proof many basic facts for these Markov chains:
- Existence and uniqueness of the stationary distribution
- convergence in total variation

8 Endnotes

For more on Doeblin, see Lindvall (1991).
Doeblin also considered non-homogeneous Markov chains, see Cohn (1981).

Version: 25 October 2024
Rmd Source

Consructing Markov chains to make them convergence