We will discuss Doeblin’s classical successful coupling of irreducible and aperiodic Markov chains on a finite state space.
From the perspective of theory, we already assume that there exists a random variable that is uniformly distributed on \([0,1]\) and thus we have at our disposal an infinite sequence of independent random variables \(U_{-1}, U_0, U_1, \ldots\) all uniformly distributed on \([0,1]\). From the perspective of R, this just means that we can simulate as many independent uniform random variables as we please. Also remember that any random variable can be thought of as a function of uniform random variable.
Let \(X\) be the Markov chain on the state space \(S\) with transition matrix \(P\) that we wish to construct or simulate. First we construct \(X_0 = f(U_{-1})\) Next, Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]
Thus \(\phi\) is noting abstract at all, in R, it is simply the procedure you would define so that given \(i\), it outputs \(j\) with probability \(p_{ij}\); here the \(U\) represents the randomization that R uses.
Then the Markov chain \(X\) can be constructed as
Sometimes such a construction is called a random function representation of a Markov chain, see also Propp and Wilson (1996).
We consider the following example of this theory in R. We will code everything in terms of uniform random variable, even though it is not necessary in R.
initial <- function(){
u = runif(1)
x=-1
k=0
while(x==-1){
k<-k+1
if( u <=0.2*k){x<-k}
}
x
}
P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
P
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.0000000 0.5000000 0.5 0.0 0.0000000
## [2,] 0.5000000 0.0000000 0.0 0.0 0.5000000
## [3,] 0.5000000 0.0000000 0.0 0.5 0.0000000
## [4,] 0.0000000 0.0000000 0.0 0.5 0.5000000
## [5,] 0.3333333 0.3333333 0.0 0.0 0.3333333
step <- function(i){
q = P[i,]
x=-1
u = runif(1)
j=0
cumq = cumsum(q)
while(x==-1){
j<-j+1
if(u <= cumq[j]){x <-j}
}
x
}
steps <- function(n){
x = initial()
for (i in 1:n){
x <- c(x, step(x[i]))
}
x
}
steps(100)
## [1] 5 1 2 5 2 5 1 2 5 2 1 2 5 1 2 5 1 2 5 1 3 4
## [23] 5 1 3 4 4 4 4 4 5 2 5 5 5 5 2 1 3 4 5 2 5 1
## [45] 3 1 3 4 4 4 5 5 2 5 1 2 5 2 1 3 4 5 5 1 3 4
## [67] 5 5 2 1 3 1 3 1 2 5 1 2 1 2 5 5 5 2 1 3 4 5
## [89] 1 2 5 1 2 1 3 4 4 4 4 4 4
Let \(P\) be a transition matrix on the state space \(S\). The period of a state \(i\in S\) is defined to be \(per(i) = \gcd\{n \geq 1: p_{ii}(n) >0\}\); here the greatest common divisor of \(24\) and \(56\) is \(8\). We say that \(P\) is aperiodic if every state \(i\) is such that \(per(i) = 1\).
Lemma 3.1 (Aperiodic transition matrices) If \(P\) is aperiodic transition matrix on a finite state space \(S\), then there exists \(M>0\) such that \(p_{ii}(n) >0\) for all \(i\in S\) and all \(n \geq M\).
The proof requires a bit of elementary number theory.
Lemma 3.2 Let \(a_i\) be positive integers. Let \(d = \gcd\{a_1, a_2, \ldots\}\).
There exists a finite \(n\) such that \(d= \gcd\{a_1, \ldots, a_n\}\).
Recall that if \(\gcd\{a_1, \ldots, a_n\}=1\), then there exist integers \(x_1, \ldots, x_n\) (possibly negative) such that \(a_1x_1 + \cdots +a_nx_n = 1.\)
In fact, there exists \(M\) such that for all \(m \geq M\), there exist nonnegative integers \(x_1, \ldots, x_n\) such that \(a_1x_1 + \cdots +a_nx_n = m.\)
Proof (of Lemma about aperiodic transition matrices). Assume \(S = \{1, \ldots, N\}\). Let \(A^i = \{n \geq 1: p_{ii}(n) >0\}\). Observe that if \(a, b \in A^i\), then \(a+b \in A^i\). Thus by elementary number theory, there exists \(M^i\) such that \(p_{ii}(m) >0\) for all \(m \geq M^i\). Choose \[M= \max\{M^1, \ldots, M^N\}.\]
We say that \(P\) is irreducible if for every \(i,j \in S\) there exists \(n\) such that \(p_{ij}(n)>0\).
Lemma 3.3 (Aperiodic and irreducible matrices) Let \(P\) be a transition matrix on a finite state space \(S\). If \(P\) is aperiodic and irreducible, then there exists \(N\) such that for all \(n \geq N\), we have \(p_{ij}(n) >0\) for all \(i,j \in S\).
Proof. Since \(P\) is aperiodic, we know that for \(M\) sufficiently large, we can be sure that for all \(i \in S\) we have \(p_{ii}(n) >0\) for all \(n \geq M\). Observe that \[ p_{ij}(n+k) \geq p_{ii}(n)p_{ij}(k).\] The irreducibility assumptions assures us that for every pair \((i,j)\) we can find \(k=k_{ij}\) such that \(p_{ij}(k) >0\). Thus \[p_{ij}(n) >0 \text{ for all } n \geq M + k_{ij};\] take \[K =\max \{ k_{ij}: i,j \in S\}\] then \(p_{ij}(n) >0\) for all \(n \geq M+K\).
Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Let \(X\) and \(Y\) be Markov chains with the same transition matrix \(P\), but with different initial distributions. We are interested in long term behaviour of \(d_{TV}(X_n, Y_n).\) By the coupling inequality, we know that \[d_{TV}(X_n, Y_n) \leq 2 \mathbb{P}(X_n' \not = 'Y_n)\] for any coupling for \(X_n\) and \(Y_n\).
Thus we should look for a coupling \((X', Y')\) of \(X\) and \(Y\), where \(X_n'\) is eventually equal to \(Y_n'\). Let \(X'\) and \(Z'\) be independent Markov chains with starting distributions of \(X\) and \(Y\) respectively. Let \(T\) be the first time such that \(X'_T = Z'_T\). For all \(n \leq T\), set \(Y'_n = Z'_n\). For \(n > T\), set \(Y'_n = X'_n\). Thus \[d_{TV}(X_n, Y_n) = d_{TV}(X'_n, Y'_n) \leq 2\mathbb{P}(T >n).\] Sometimes \((X_n', Y_n')\) is called Doeblin’s coupling or the classical coupling; the coupling is said to be successful at the random time \(T\).
Another way to understand Doeblin’s coupling is to actually construct the Markov chains.
Let \(\phi:S \times [0,1] \to S\) be a function such that if \(U\) is uniformly distributed in \([0,1]\) and independent of \(X_0\), then \[\mathbb{P}(\phi(X_0, U) =j \ | \ X_0=i) = p_{ij}.\]
Let \(U_{-1}, U_0, U_1, \ldots, V_{-1}, V_0, V_1, \ldots\) be independent random variables that are uniformly distributed in \([0,1]\). Let \(X'_0 = f(U_{-1})\) have the same law as \(X_0\) and \(X_n' = \phi(X_{n-1}', U_{n-1})\). Similarly, let \(Z_0'= g(V_{-1})\) have the same law as \(Y_0\) and \(Z_n' = \phi(Z_{n-1}', V_{n-1})\) Thus
\[ X'=\big(X_0', X_1', X_2', \dots\big)\]
and
\[Z'=\big(Z_0', Z_1', Z_2', \dots\big)\]
have the same laws as \(X\) and \(Y\), respectively
Notice that
\[\big(Z_0', \phi(Z_0', V_0), \phi(Z_1', V_1), \phi\big(\phi(Z_1', V_1), U_2\big)\big) \text{ has the same law as } (Y_0, Y_1, Y_2, Y_3);\]
it does not matter if we choose to use the \(V_i\)’s or the \(U_i\)’s. Let \(s \in S\). We can say choose to use the \(U_i'\) whenever we see that \(Y_i'=s\), and use the \(V_i\)’s otherwise.
Let \(T\) be the first time that \(X'_T = Z'_T\). We have that \[Y'= \Big(Z_0', \phi(Z_0',V_0), \phi(Z_1', V_1), \phi(Z_2', V_2), \dots, \phi(Z_T', V_T) = \phi(X_T', U_T), \phi(X'_{T+1}, U_{T+1}), \ldots \Big)\] has the same law as \(Y\).
The following simulation considers the transition matrix considered earlier. We will consider the Doeblin coupling of two chains started at \(1\) and \(5\).
P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
step <- function(i){
q = P[i,]
x=-1
u = runif(1)
j=0
cumq = cumsum(q)
while(x==-1){
j<-j+1
if(u <= cumq[j]){x <-j}
}
x
}
stepsind<- function(){
x=1
y=5
while(x[length(x)] != y[length(y)] ){
x<- c(x, step(x[length(x)]))
y<- c(y, step(y[length(y)]))
}
rbind(x,y)
}
coupled <- function(z){
tog = z[1,]
tog <- tog[length(tog)]
for (i in 1:5){
tog <- c(tog, step(tog[length(tog)]))
}
x = c(z[1,], tog)
y = c(z[2,], tog)
rbind(x,y)
}
coupled(stepsind())
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## x 1 2 1 2 1 2 5 2 1 2 5 5 1 3
## y 5 1 3 4 4 4 4 4 4 4 5 5 1 3
## [,15] [,16] [,17]
## x 4 4 4
## y 4 4 4
We need to show that \(\mathbb{P}( T >n) \to 0\) as \(n \to \infty\). Let \(M\) be so that \(P^M >0\). Let \(\varepsilon = \min_{i,j \in S} p_{ij}(M)>0\). Since \(X'\) and \(Z'\) are independent, we have \[\mathbb{P}(X'_M = Z'_M) = \sum_{j} \mathbb{P}(X_M' = j)\mathbb{P}(Z_M' = j) \geq \varepsilon \sum_{j} \mathbb{P}(X_M' = j)= \varepsilon.\]
Let \(k\) be an integer. Then we have that
\[\mathbb{P}(T > kM) \leq (1- \varepsilon)^k.\]
Theorem 4.1 (Doeblin's classical coupling) If \(X\) and \(Y\) are Markov chains with the same aperiodic irreducible transition matrix on a finite state space \(S\), then there exists \(M >0\) and \(\varepsilon >0\) such that \[d_{TV}(X_n,Y_n) \leq 2 (1- \varepsilon)^{ \lfloor n/M \rfloor}.\] Here \(M\) and \(\epsilon\) depends only on \(P\).
The following simple fact will allows us to harness Doeblin’s coupling further.
Lemma 5.1 (max/min) Let \(P\) be transition matrix on a finite state space \(S\). The sequence given by \[a_j(n) = \min_{i \in S} p_{ij}(n)\] is nondecreasing, and the sequence given by \[b_j(n) = \max_{i \in S} p_{ij}(n)\] is nonincreasing.
Proof. We give the proof for the nondecreasing case. Observe that \[\begin{eqnarray*} a_j(n+1) &=& \min_{i \in S} \sum_k p_{ik}(1) p_{kj}(n) \\ &\geq& \min_{i \in S} \sum_k p_{ik}(1) a_j(n) \\ &=& a_j(n). \end{eqnarray*}\]
Let \(P\) be a transition matrix on a state space \(S\). Recall an initial distribution \(\lambda\) is stationary if \(\lambda P = \lambda\).
Theorem 5.1 (Stationary distributions for finite state Markov chains) Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\). Then there exists a unique stationary distribution \(\lambda\) for \(P\); in particular:
\(\lambda_j = \lim_{n \to \infty} \min_{i \in S} p_{ij}(n)\)
If \(X\) is any Markov chain with transition matrix \(P\), then \(\mathbb{P}(X_n=j) \to \lambda_j\).
For all \(i,j \in S\), we have \(p_{ij}(n) \to \lambda_j\).
Proof. If both \(\lambda\) and \(\mu\) are stationary, then \(\mu P^n = \mu\) and \(\lambda P^n = \lambda\) for all \(n\), and by Doeblin’s theorem, we can conclude that if \(X\) and \(Y\) are Markov chains with transition matrix \(P\) and initial distributions \(\lambda\) and \(\mu\), respectively, we have \(X_n\) has the same law as \(X_0\) and \(Y_n\) has the same law as \(Y_0\), and
\[ d_{TV}(X_0, Y_0) = d_{TV}(X_n, Y_n) \to 0.\] Hence \(X_0\) has the same law as \(Y_0\) and \(\lambda = \mu\).
By the max/min lemma we may define \[\lambda_j =\lim_{n \to \infty} a_j(n).\]
We will show that
\[\mathbb{P}(X_n=j) \to \lambda_j.\]
Let \(\varepsilon >0\). By Doeblin’s theorem, there exists \(N_1>0\), so that for any Markov chains \(X\) and \(Y\) with transition matrix \(P\), we have
\[d_{TV}(X_n,Y_n)< \varepsilon/2\]
for \(n \geq N_1\). Choose \(N_2\) so that
\[|\lambda_j -a_j(n)| < \varepsilon/2\]
for all \(n \geq N_2\).
Let \(m \geq \max\{N_1, N_2\}\). Let \(i \in S\) be chosen so that \(p_{ij}(m)=a_j(m) = \min_{i \in S} p_{ij}(m)\). Let \(Y\) be a Markov chain started at \(i\). Then
\[|\mathbb{P}(X_m=j) - \mathbb{P}(Y_m=j)| = |\mathbb{P}(X_m=j) - a_j(m)|,\] so that
\[|\mathbb{P}(X_m=j) - \lambda_j| < \varepsilon.\]
Thus \(\mathbb{P}(X_n = j) \to \lambda_j\).
The last claims follows by considering a Markov chain started at \(i\).
It remains to verify that \(\lambda\) is a stationary measure. We have \[\sum_{j \in S} \lambda_j = \lim_{n \to \infty} \sum_{j \in S} \mathbb{P}(X_n=j) =1.\]
To verify stationarity, we note that \(p_{ji}(n) \to \lambda_i\). Hence, \[(\lambda P)_j = \sum_{i \in S} \lambda_i p_{ij} = \lim_{n \to \infty} \sum_{i \in S} p_{ji}(n)p_{ij} = \lim_{n \to \infty} p_{jj}(n+1) = \lambda_j\]
Theorem 6.1 Let \(P\) be an aperiodic irreducible transition matrix on a finite state space \(S\) so that it has a stationary distribution \(\pi\). Let \(X\) be a Markov chain with transition matrix \(P\). Fix a state \(s\in S\), and set
\[Y_n = \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k = s].\] Then \(Y_n\) converges to \(\pi(s)\) in the mean-squared, and consequently in probability.
This version of the law of large numbers is slightly harder to prove than the weak one that we proved for the i.i.d. case.
Proof. We will consider the special case where \(X\) is started at the stationary dsitribution.
We will make use of the following fact, which follows from Doeblin’s theorem and summing a geometric series: \[\sum_{\ell = 1} ^{n-1} \sum_{k= 0} ^{\ell-1} \big( p_{ss}(\ell - k) - \pi(s) \big) \leq Cn,\] for some fixed constant \(C >0\).
Now some algebra gives that \[\begin{eqnarray*} (Y_n - \pi(s) )^2 &=& \frac{1}{n^2} \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 -\frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k=s] \ + \ \pi(s)^2, \end{eqnarray*}\] and \[ \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 = \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] + 2\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \mathbf{1}[X_k=s]\mathbf{1}[X_{\ell}=s].\] Thus \[ \mathbb{E} \Big( \frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \mathbf{1}[X_k=s] \Big) = \frac{2 \pi(s)}{n} \sum_{k=0}^{n-1} \pi(s) = 2\pi(s)^2\] \[\begin{eqnarray*} \mathbb{E} \Big( \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s] \Big)^2 &=& \sum_{k=0} ^{n-1} \pi(s) + 2\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \mathbb{P}(X_{\ell} =s \ | \ X_{k}=s) \mathbb{P}(X_k =s) \\ &=& n\pi(s) + 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} p_{ss}(\ell-k). \end{eqnarray*}\] Also note that the double sum contains \(n(n-1)/2\) terms, so \[\begin{eqnarray*} n^2\mathbb{E} (Y_n - \pi(s) )^2 &=& n\pi(s) +\big(n\pi(s)^2 - n\pi(s)^2\big) - n^2\pi(s) + 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} p_{ss}(\ell-k) \\ &=& n\big(\pi(s) - \pi(s)^2\big) \ + \ 2\pi(s)\sum_{\ell =1} ^{n-1} \sum_{k=0} ^{\ell -1} \big( p_{ss}(\ell-k) - \pi(s) \big). \end{eqnarray*}\] Hence the result follows from our first assertion.