We will review some basics of the limiting behaviour of Markov chains, focusing on the case where there are a finite number of states, and where the transition matrix is irreducible and aperiodic. We also approach the Doeblin coupling from a computational perspective and offer a probabilistic analysis of the convergence to the stationary distribution.
Markov chains are simplest extensions of i.i.d. sequences of random variables, where we allow the next random variable to depend on the values of the previous one in a predetermined way, according to a transition matrix. A quick google search of applications of Markov chains, gave for me, the first link an article on modelling migraine attacks. Here, a simple toy model is introduced to model the occurrences of migraines. Specifically, the probability of getting a migraine is say \(p\), then once someone has a migraine, whether it last another day is probability \(p_1\), and then whether this continues for the \(n\)th day occurs with probability \(p_n\). Such a model accounts more accurately for the occurrence of migraine clusters.
Let \(X=(X_0, X_1, \ldots)\) be random variables taking values in a finite or countable set \(S\); which we call the state space. Sometimes we will take \(S = \mathbb{Z}\) or \(S=\mathbb{N}\) or \(S= \{1, \ldots, N\}\), or \(S=\{s_1, \ldots, s_N\}\). We say that \(X\) is a (homogeneous) Markov chain if for all \(n \geq 0\), we have \[\begin{eqnarray*} \mathbb{P}(X_{n+1} =j \ | \ X_n=i, X_{n-1}=a_{n-1}, \ldots, X_0 = a_{0}) &=& \mathbb{P}(X_{n+1} = j \ | \ X_n = i) \\ &=& \mathbb{P} (X_1 = j \ | \ X_0 =i), \end{eqnarray*}\] for all \(i,j, a_{n-1}, \ldots, a_{0} \in S\); which we sometimes refer to as the Markov property. We call the matrix \(P\) given by entries \(P_{ij} = p_{ij} = \mathbb{P} (X_1 = j \ | \ X_0 =i)\) the transition matrix for the chain \(X\). Conversely, we also call \(P\) a transition matrix on \(S\) if for all \(i \in S\), we have \(\sum_{j \in S} p_{ij} =1\). The law of \(X_0\) is sometimes called the initial distribution. Notice that if the law of \(X_0\) is expressed as a row vector \(\lambda = [\lambda_1, \lambda_2, \ldots]\), then \(\lambda P\) gives the law of \(X_1\) expressed as a row vector, which we also call a probability vector. If \(\mathbb{P} (X_0 = i) = 1\), then we say that the chain \(X\) is started at the state \(i\).
If \(P\) is a transition matrix, one may think of \(P^n\) as the matrix with entries \(p_{ij}(n)\) which gives the number of weighted paths of length \(n\) from \(i\) to \(j\). To help us understand this, we consider the following exercise.
Lemma 1.1 Let \(P\) be the transition matrix for a Markox chain \(X\). Consider the matrix \(P^n\), where we denote its entries via \(P^n_{ij} = p_{ij}(n).\)
Exercise 1.2 Let \(P\) be the transition matrix on three states \(\{1,2,3\}\) given by \[\begin{equation*} P = \begin{pmatrix} 1/3 & 1/3 & 1/3 \\ 1/4 & 1/4 & 1/2 \\ 1/8 & 1/4 & 5/8 \end{pmatrix} \end{equation*}\]
Suppose that \(X\) is a Markov chain, and suppose \(X_0\) has law \((1/3, 1/3, 1/3)\).
P <- matrix(c(1/3, 1/3, 1/3, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3)
P <- t(P)
lambda = c(1/3, 1/3, 1/3)
Q = P %*% P %*% P %*% P %*% P
x=lambda %*% Q
x
## [,1] [,2] [,3]
## [1,] 0.2001395 0.2667132 0.5331473
x[3]
## [1] 0.5331473
M = Q
for (i in 1:100){
M <-M %*% Q
}
M
## [,1] [,2] [,3]
## [1,] 0.2 0.2666667 0.5333333
## [2,] 0.2 0.2666667 0.5333333
## [3,] 0.2 0.2666667 0.5333333
Given a transition matrix \(P\), we say that a probability vector \(\lambda\) is a stationary distribution if it is a left eigenvector of \(P\) with eigenvalue \(1\), so that
\[ \lambda P = \lambda.\]
Hence if a Markov chain is started at stationarity, then the distribution of the next step \(X_1\) remains the same, and also all future steps. Depending on the transition matrix, there may by more than one stationary distribution, or there may not be a stationary distribution at all.
Exercise 1.3 Again, let \(P\) be the transition matrix on three states \(\{1,2,3\}\) given by \[\begin{equation*} P = \begin{pmatrix} 1/3 & 1/3 & 1/3 \\ 1/4 & 1/4 & 1/2 \\ 1/8 & 1/4 & 5/8 \end{pmatrix} \end{equation*}\] Find any stationary distributions for \(P\). What do you see?
P <- matrix(c(1/3, 1/3, 1/3, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3)
P <- t(P)
lambda = c(1/3, 1/3, 1/3)
tP = t(P)
E=eigen(tP)
E
## eigen() decomposition
## $values
## [1] 1.00000000 0.25000000 -0.04166667
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.3179994 -0.5883484 0.4082483
## [2,] -0.4239992 -0.1961161 -0.8164966
## [3,] -0.8479983 0.7844645 0.4082483
EE = E$vectors[,1]
EE
## [1] -0.3179994 -0.4239992 -0.8479983
F= EE /sum(EE)
F
## [1] 0.2000000 0.2666667 0.5333333
One obstruction to the existence of a stationary distribution is periodicity; think of a Markov chain that can only enter a state \(i\) at even times. Another obstruction occurs if the Markov chain has countably many states and it drifts away with positive probability. A Markov chain can also have multiple stationary distributions if it has absorbing states, that is states from which it cannot leave once it enters, or if it has a subset of states from which it enters it cannot leave.
The following exercise gives us an easy class of examples to experiment with.
We reviewed the basics of the Markov property and transition matrices. We saw how to compute stationary distributions as a problem of linear algebra. We also introduced the class of examples given by random walk on a finite graph.