We will discuss Doeblin’s classical successful coupling of irreducible and aperiodic Markov chains on a finite state space.

From the perspective of theory, we already assume that there exists a random variable that is uniformly distributed on \([0,1]\) and thus we have at our disposal an infinite sequence of independent random variables \(U_{-1}, U_0, U_1, \ldots\) all uniformly distributed on \([0,1]\). From the perspective of R, this just means that we can simulate as many independent uniform random variables as we please. Also remember that any random variable can be thought of as a function of uniform random variable.

Let \(X\) be the Markov chain on the state space \(S\) with transition matrix \(P\) that we wish to construct or simulate. First we construct \(X_0 = f(U_{-1})\) Next, Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]

Thus \(\phi\) is noting abstract at all, in R, it is simply the procedure you would define so that given \(i\), it outputs \(j\) with probability \(p_{ij}\); here the \(U\) represents the randomization that R uses.

Then the Markov chain \(X\) can be constructed as

- \(X_0 = f(U_{-1})\)
- \(X_1 = \phi(X_0, U_0)\)
- \(X_2 = \phi(X_1, U_1)\)
- \(X_{n+1} = \phi(X_{n}, U_n)\)

We consider the following example of this theory in R. We will code everything in terms of uniform random variable, even though it is not necessary in R.

```
initial <- function(){
u = runif(1)
x=-1
k=0
while(x==-1){
k<-k+1
if( u <=0.2*k){x<-k}
}
x
}
P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
P
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.0000000 0.5000000 0.5 0.0 0.0000000
## [2,] 0.5000000 0.0000000 0.0 0.0 0.5000000
## [3,] 0.5000000 0.0000000 0.0 0.5 0.0000000
## [4,] 0.0000000 0.0000000 0.0 0.5 0.5000000
## [5,] 0.3333333 0.3333333 0.0 0.0 0.3333333
```

```
step <- function(i){
q = P[i,]
x=-1
u = runif(1)
j=0
cumq = cumsum(q)
while(x==-1){
j<-j+1
if(u <= cumq[j]){x <-j}
}
x
}
steps <- function(n){
x = initial()
for (i in 1:n){
x <- c(x, step(x[i]))
}
x
}
steps(100)
```

```
## [1] 5 1 3 1 3 1 2 5 5 2 1 2 1 2 5 5 2 1 3 4 4 5
## [23] 1 3 4 4 4 4 4 5 5 1 3 4 4 5 1 3 1 2 5 2 5 1
## [45] 3 1 3 4 4 4 4 4 5 2 5 5 1 2 1 3 4 4 5 1 2 5
## [67] 1 3 1 2 5 2 5 5 1 3 4 4 5 2 5 1 3 1 2 1 3 4
## [89] 4 4 5 5 1 3 1 2 5 1 3 4 4
```

Let \(P\) be a transition matrix on the state space \(S\). The **period** of a state \(i\in S\) is defined to be \(per(i) = \gcd\{n \geq 1: p_{ii}(n) >0\}\); here the greatest common divisor of \(24\) and \(56\) is \(8\). We say that \(P\) is **aperiodic** if every state \(i\) is such that \(per(i) = 1\).

The proof requires a bit of elementary number theory.

**Lemma 3.2 **Let \(a_i\) be positive integers. Let \(d = \gcd\{a_1, a_2, \ldots\}\).

There exists a finite \(n\) such that \(d= \gcd\{a_1, \ldots, a_n\}\).

Recall that if \(\gcd\{a_1, \ldots, a_n\}=1\), then there exist integers \(x_1, \ldots, x_n\) (possibly negative) such that \(a_1x_1 + \cdots +a_nx_n = 1.\)

In fact, there exists \(M\) such that for all \(m \geq M\), there exist nonnegative integers \(x_1, \ldots, x_n\) such that \(a_1x_1 + \cdots +a_nx_n = m.\)

We say that \(P\) is **irreducible** if for every \(i,j \in S\) there exists \(n\) such that \(p_{ij}(n)>0\).

*Proof. * Since \(P\) is aperiodic, we know that for \(M\) sufficiently large, we can be sure that for all \(i \in S\) we have \(p_{ii}(n) >0\) for all \(n \geq M\). Observe that
\[ p_{ij}(n+k) \geq p_{ii}(n)p_{ij}(k).\]
The irreducibility assumptions assures us that for every pair \((i,j)\) we can find \(k=k_{ij}\) such that \(p_{ij}(k) >0\).
Thus
\[p_{ij}(n) >0 \text{ for all } n \geq M + k_{ij};\]
take \[K =\max \{ k_{ij}: i,j \in S\}\]
then \(p_{ij}(n) >0\) for all \(n \geq M+K\).

Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Let \(X\) and \(Y\) be Markov chains with the same transition matrix \(P\), but with different initial distributions. We are interested in long term behaviour of \(d_{TV}(X_n, Y_n).\) By the coupling inequality, we know that \[d_{TV}(X_n, Y_n) \leq 2 \mathbb{P}(X_n' \not = 'Y_n)\] for any coupling for \(X_n\) and \(Y_n\).

Thus we should look for a coupling \((X', Y')\) of \(X\) and \(Y\), where \(X_n'\) is eventually equal to \(Y_n'\). Let \(X'\) and \(Z'\) be independent Markov chains with starting distributions of \(X\) and \(Y\) respectively. Let \(T\) be the first time such that \(X'_T = Z'_T\). For all \(n \leq T\), set \(Y'_n = Z'_n\). For \(n > T\), set \(Y'_n = X'_n\). Thus
\[d_{TV}(X_n, Y_n) = d_{TV}(X'_n, Y'_n) \leq 2\mathbb{P}(T >n).\] Sometimes \((X_n', Y_n')\) is called **Doeblin’s coupling** or the **classical coupling**; the coupling is said to be **successful** at the random time \(T\).

Another way to understand Doeblin’s coupling is to actually construct the Markov chains.

Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]

Let \(U_{-1}, U_0, U_1, \ldots, V_{-1}, V_0, V_1, \ldots\) be independent random variables that are uniformly distributed in \([0,1]\). Let \(X'_0 = f(U_{-1})\) have the same law as \(X_0\) and \(X_n' = \phi(X_{n-1}', U_{n-1})\). Similarly, let \(Z_0'= g(V_{-1})\) have the same law as \(Y_0\) and \(Z_n' = \phi(Z_{n-1}', V_{n-1})\) Thus
\[ X'=\big(X_0', X_1', X_2', \dots\big)\]
and
\[Z'=\big(Z_0', Z_1', Z_2', \dots\big)\]
have the same laws as \(X\) and \(Y\), respectively
Notice that
\[\big(Z_0', \phi(Z_0', V_0), \phi(Z_1', V_1), \phi\big(\phi(Z_1', V_1), U_2\big)\big) \text{ has the same law as } (Y_0, Y_1, Y_2, Y_3);\]
it does not matter if we choose to use the \(V_i\)’s or the \(U_i\)’s. Let \(s \in S\). We can say choose to use the \(U_i'\) whenever we see that \(Y_i'=s\), and use the \(V_i\)’s otherwise.

Let \(T\) be the first time that \(X'_T = Z'_T\). We have that \[Y'= \Big(Z_0', \phi(Z_0',V_0), \phi(Z_1', V_1), \phi(Z_2', V_2), \dots, \phi(Z_T', V_T) = \phi(X_T', U_T), \phi(X'_{T+1}, U_{T+1}), \ldots \Big)\] has the same law as \(Y\).

The following simulation considers the transition matrix considered earlier. We will consider the Doeblin coupling of two chains started at \(1\) and \(5\).

```
P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
step <- function(i){
q = P[i,]
x=-1
u = runif(1)
j=0
cumq = cumsum(q)
while(x==-1){
j<-j+1
if(u <= cumq[j]){x <-j}
}
x
}
stepsind<- function(){
x=1
y=5
while(x[length(x)] != y[length(y)] ){
x<- c(x, step(x[length(x)]))
y<- c(y, step(y[length(y)]))
}
rbind(x,y)
}
coupled <- function(z){
tog = z[1,]
tog <- tog[length(tog)]
for (i in 1:5){
tog <- c(tog, step(tog[length(tog)]))
}
x = c(z[1,], tog)
y = c(z[2,], tog)
rbind(x,y)
}
coupled(stepsind())
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## x 1 3 1 3 1 2 1 3 4 4 5 1 3 4
## y 5 1 3 1 2 1 3 4 5 1 2 5 5 2
## [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27]
## x 5 1 2 5 1 3 1 1 3 1 2 5 1
## y 1 2 5 2 5 5 1 1 3 1 2 5 1
```

We need to show that \(\mathbb{P}( T >n) \to 0\) as \(n \to \infty\). Let \(M\) be so that \(P^M >0\). Let \(\varepsilon = \min_{i,j \in S} p_{ij}(M)>0\). Since \(X'\) and \(Z'\) are independent, we have \[\mathbb{P}(X'_M = Z'_M) = \sum_{j} \mathbb{P}(X_M' = j)\mathbb{P}(Z_M' = j) \geq \varepsilon \sum_{j} \mathbb{P}(X_M' = j)= \varepsilon.\]

Let \(k\) be an integer. Then we have that\[\mathbb{P}(T > kM) \leq (1- \varepsilon)^k.\]

The following simple fact will allows us to harness Doeblin’s coupling further.

Let \(P\) be a transition matrix on a state space \(S\). Recall an initial distribution \(\lambda\) is stationary if \(\lambda P = \lambda\).

**Theorem 5.1 (Stationary distributions for finite state Markov chains) **Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Then there exists a unique stationary distribution \(\lambda\) for \(P\); in particular:

\(\lambda_j = \lim_{n \to \infty} \min_{i \in S} p_{ij}(n)\)

If \(X\) is any Markov chain with transition matrix \(P\), then \(\mathbb{P}(X_n=j) \to \lambda_j\).

For all \(i,j \in S\), we have \(p_{ij}(n) \to \lambda_j\).

*Proof. * If both \(\lambda\) and \(\mu\) are stationary, then \(\mu P^n = \mu\) and \(\lambda P^n = \lambda\) for all \(n\), and by Doeblin’s theorem, we can conclude that if \(X\) and \(Y\) are Markov chains with transition matrix \(P\) and initial distributions \(\lambda\) and \(\mu\), respectively, we have \(X_n\) has the same law as \(X_0\) and \(Y_n\) has the same law as \(Y_0\), and

\[ d_{TV}(X_0, Y_0) = d_{TV}(X_n, Y_n) \to 0.\] Hence \(X_0\) has the same law as \(Y_0\) and \(\lambda = \mu\).

By the max/min lemma we may define \[\lambda_j =\lim_{n \to \infty} a_j(n).\]

We will show that

\[\mathbb{P}(X_n=j) \to \lambda_j.\]

Let \(\varepsilon >0\). By Doeblin’s theorem, there exists \(N_1>0\), so that for *any* Markov chains \(X\) and \(Y\) with transition matrix \(P\), we have

\[d_{TV}(X_n,Y_n)< \varepsilon/2\]
for \(n \geq N_1\). Choose \(N_2\) so that
\[|\lambda_j -a_j(n)| < \varepsilon/2\]
for all \(n \geq N_2\).

Let \(m \geq \max\{N_1, N_2\}\). Let \(i \in S\) be chosen so that \(p_{ij}(m)=a_j(m) = \min_{i \in S} p_{ij}(m)\). Let \(Y\) be a Markov chain started at \(i\). Then
\[|\mathbb{P}(X_m=j) - \mathbb{P}(Y_m=j)| = |\mathbb{P}(X_m=j) - a_j(m)|,\] so that
\[|\mathbb{P}(X_m=j) - \lambda_j| < \varepsilon.\]

Thus \(\mathbb{P}(X_n = j) \to \lambda_j\).

The last claims follows by considering a Markov chain started at \(i\).

It remains to verify that \(\lambda\) is a stationary measure. We have \[\sum_{j \in S} \lambda_j = \lim_{n \to \infty} \sum_{j \in S} \mathbb{P}(X_n=j) =1.\]

To verify stationarity, we note that \(p_{ji}(n) \to \lambda_i\). Hence, \[(\lambda P)_j = \sum_{i \in S} \lambda_i p_{ij} = \lim_{n \to \infty} \sum_{i \in S} p_{ji}(n)p_{ij} = \lim_{n \to \infty} p_{jj}(n+1) = \lambda_j\]**Theorem 6.1 **
Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\) so that it has a stationary distribution \(\pi\). Let \(X\) be a Markov chain with transition matrix \(P\). Fix a state \(s\in S\), and set

This version of the law of large numbers is slightly harder to prove than the weak one that we proved for the i.i.d. case.

*Proof. * We will consider the special case where \(X\) is started at the stationary dsitribution.

We will make use of the following fact, which follows from Doeblin’s theorem and summing a geometric series: \[\sum_{\ell = 1} ^{n-1} \sum_{k= 0} ^{\ell-1} \big( p_{ss}(\ell - k) - \pi(s) \big) \leq Cn,\] for some fixed constant \(C >0\).

Now some algebra gives that \[\begin{eqnarray*} (Y_n - \pi(s) )^2 &=& \frac{1}{n^2} \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 -\frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k=s] \ + \ \pi(s)^2, \end{eqnarray*}\] and \[ \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 = \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] + 2\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \mathbf{1}[X_k=s]\mathbf{1}[X_{\ell}=s].\] Thus \[ \mathbb{E} \Big( \frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k=s] \Big) = \frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \pi(s) = 2\pi(s)^2\] \[\begin{eqnarray*} \mathbb{E} \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 &=& \sum_{k=0} ^{n-1} \pi(s) + 2\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \mathbb{P}(X_{\ell} =s \ | \ X_{k}=s) \mathbb{P}(X_k =s) \\ &=& n\pi(s) + 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} p_{ss}(\ell-k). \end{eqnarray*}\] Also note that the double sum contains \(n(n-1)/2\) terms, so \[\begin{eqnarray*} n^2\mathbb{E} (Y_n - \pi(s) )^2 &=& n\pi(s) +\big(n\pi(s)^2 - n\pi(s)^2\big) - n^2\pi(s) + 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} p_{ss}(\ell-k) \\ &=& n\big(\pi(s) - \pi(s)^2\big) \ + \ 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \big( p_{ss}(\ell-k) - \pi(s) \big). \end{eqnarray*}\] Hence the result follows from our first assertion.- Constructed Markov chains

- Defined irreducibity and aperiodicity
- Defined Doeblin’s coupling
- Used Doeblin’s coupling to proof many basic facts for these Markov chains:
- Existence and uniqueness of the stationary distribution
- convergence in total variation