We will review some basics of the limiting behaviour of Markov chains, focusing on the case where there are a finite number of states, and where the transition matrix is irreducible and aperiodic. We also approach the Doeblin coupling from a computational perspective and offer a probabilistic analysis of the convergence to the stationary distribution.

Markov chains are simplest extensions of i.i.d. sequences of random variables, where we allow the next random variable to depend on the values of the previous one in a predetermined way, according to a transition matrix. A quick google search of applications of Markov chains, gave for me, the first link an article on modelling migraine attacks. Here, a simple toy model is introduced to model the occurrences of migraines. Specifically, the probability of getting a migraine is say \(p\), then once someone has a migraine, whether it last another day is probability \(p_1\), and then whether this continues for the \(n\)th day occurs with probability \(p_n\). Such a model accounts more accurately for the occurrence of migraine clusters.

Let \(X=(X_0, X_1, \ldots)\) be random variables taking values in a finite or countable set \(S\); which we call the **state space**. Sometimes we will take \(S = \mathbb{Z}\) or \(S=\mathbb{N}\) or \(S= \{1, \ldots, N\}\), or \(S=\{s_1, \ldots, s_N\}\). We say that \(X\) is a (homogeneous) **Markov chain** if for all \(n \geq 0\), we have
\[\begin{eqnarray*}
\mathbb{P}(X_{n+1} =j \ | \ X_n=i, X_{n-1}=a_{n-1}, \ldots, X_0 = a_{0}) &=& \mathbb{P}(X_{n+1} = j \ | \ X_n = i) \\
&=& \mathbb{P} (X_1 = j \ | \ X_0 =i),
\end{eqnarray*}\]
for all \(i,j, a_{n-1}, \ldots, a_{0} \in S\); which we sometimes refer to as the **Markov property**.
We call the matrix \(P\) given by entries \(P_{ij} = p_{ij} = \mathbb{P} (X_1 = j \ | \ X_0 =i)\) the **transition matrix** for the chain \(X\). Conversely, we also call \(P\) a **transition matrix** on \(S\) if for all \(i \in S\), we have \(\sum_{j \in S} p_{ij} =1\). The law of \(X_0\) is sometimes called the **initial** distribution. Notice that if the law of \(X_0\) is expressed as a row vector \(\lambda = [\lambda_1, \lambda_2, \ldots]\), then \(\lambda P\) gives the law of \(X_1\) expressed as a row vector, which we also call a **probability vector**. If \(\mathbb{P} (X_0 = i) = 1\), then we say that the chain \(X\) is **started** at the state \(i\).

If \(P\) is a transition matrix, one may think of \(P^n\) as the matrix with entries \(p_{ij}(n)\) which gives the number of *weighted* paths of length \(n\) from \(i\) to \(j\). To help us understand this, we consider the following exercise.

for each \(i,j \in V\). Recall that a path of length \(n\) from \(i\) to \(j\) is given by a sequence of vertices \(i=v_1, v_2, \ldots, v_{n-1},v_{n}=j\) such that \((v_i, v_{i+1}) \in E\). Show that the number for paths of length \(n\) from \(i\) to \(j\) is given by \(A^n_{ij}\) that is the \((i,j)\) entry in the matrix \(A^n\).

**Lemma 1.1 **Let \(P\) be the transition matrix for a Markox chain \(X\). Consider the matrix \(P^n\), where we denote its entries via \(P^n_{ij} = p_{ij}(n).\)

- We have that \(p_{ij}(n) = \mathbb{P} (X_n=j \ | \ X_0=i)\).
- If \(\lambda\) is the initial distribution of \(X\), then \(\lambda P^n\) gives the law of \(X_n\).

Then \[\begin{eqnarray*} \mathbb{P} (X_{n} =j | X_0=i) &=& \sum_k \mathbb{P} (X_{n} =j, X_{n-1} = k | X_0=i) \\ &=& \sum_k \mathbb{P} (X_{n} =j | X_{n-1} = k, X_0=i) \mathbb{P} (X_{n-1} = k| X_0=i) \end{eqnarray*}\] It follows from the Markov property that \[ \mathbb{P} (X_{n} =j | X_{n-1} = k, X_0=i) = \mathbb{P} (X_{n} =j | X_{n-1} = k) = p_{kj}\] and the inductive hypothesis gives that \[ \mathbb{P} (X_{n-1} = k| X_0=i) = p_{ik}(n-1).\] Thus from the definition of matrix multiplication, we have \[ \mathbb{P} (X_{n} =j | X_0=i)= p_{ij}(n) = \sum_k p_{ik} p_{kj}(n-1)\] as desired. Finally if \(X_0\) has law \(\lambda\) given as a row vector, then

\[\begin{eqnarray*} \mathbb{P} (X_n = j) &=& \sum_k \mathbb{P} (X_n = j | X_0 = k) \mathbb{P} (X_0 = k) \\ &=& \sum_k \lambda_k p_{kj}(n) \\ &=& (\lambda P^n)_j \end{eqnarray*}\]

**Exercise 1.2 **Let \(P\) be the transition matrix on three states \(\{1,2,3\}\) given by
\[\begin{equation*}
P =
\begin{pmatrix}
1/3 & 1/3 & 1/3 \\
1/4 & 1/4 & 1/2 \\
1/8 & 1/4 & 5/8
\end{pmatrix}
\end{equation*}\]

Suppose that \(X\) is a Markov chain, and suppose \(X_0\) has law \((1/3, 1/3, 1/3)\).

- Find \(\mathbb{P} (X_5 = 3)\)
- Compute \(P^n\) for large \(n\)
- If you change the initial distribution for \(X_0\), for large \(n\), what effect does this have on the distribution \(X_n\)

```
P <- matrix(c(1/3, 1/3, 1/3, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3)
P <- t(P)
lambda = c(1/3, 1/3, 1/3)
Q = P %*% P %*% P %*% P %*% P
x=lambda %*% Q
x
```

```
## [,1] [,2] [,3]
## [1,] 0.2001395 0.2667132 0.5331473
```

` x[3]`

`## [1] 0.5331473`

```
M = Q
for (i in 1:100){
M <-M %*% Q
}
M
```

```
## [,1] [,2] [,3]
## [1,] 0.2 0.2666667 0.5333333
## [2,] 0.2 0.2666667 0.5333333
## [3,] 0.2 0.2666667 0.5333333
```

Given a transition matrix \(P\), we say that a probability vector \(\lambda\) is a **stationary** distribution if it is a left eigenvector of \(P\) with eigenvalue \(1\), so that

\[ \lambda P = \lambda.\]
Hence if a Markov chain is started at stationarity, then the distribution of the next step \(X_1\) remains the same, and also all future steps. Depending on the transition matrix, there may by more than one stationary distribution, or there may not be a stationary distribution at all.

**Exercise 1.3 ** Again, let \(P\) be the transition matrix on three states \(\{1,2,3\}\) given by
\[\begin{equation*}
P =
\begin{pmatrix}
1/3 & 1/3 & 1/3 \\
1/4 & 1/4 & 1/2 \\
1/8 & 1/4 & 5/8
\end{pmatrix}
\end{equation*}\]
Find any stationary distributions for \(P\). What do you see?

```
P <- matrix(c(1/3, 1/3, 1/3, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3)
P <- t(P)
lambda = c(1/3, 1/3, 1/3)
tP = t(P)
E=eigen(tP)
E
```

```
## eigen() decomposition
## $values
## [1] 1.00000000 0.25000000 -0.04166667
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.3179994 -0.5883484 0.4082483
## [2,] -0.4239992 -0.1961161 -0.8164966
## [3,] -0.8479983 0.7844645 0.4082483
```

```
EE = E$vectors[,1]
EE
```

`## [1] -0.3179994 -0.4239992 -0.8479983`

```
F= EE /sum(EE)
F
```

`## [1] 0.2000000 0.2666667 0.5333333`

One obstruction to the existence of a stationary distribution is periodicity; think of a Markov chain that can only enter a state \(i\) at even times. Another obstruction occurs if the Markov chain has countably many states and it drifts away with positive probability. A Markov chain can also have multiple stationary distributions if it has absorbing states, that is states from which it cannot leave once it enters, or if it has a subset of states from which it enters it cannot leave.

The following exercise gives us an easy class of examples to experiment with.

We reviewed the basics of the Markov property and transition matrices. We saw how to compute stationary distributions as a problem of linear algebra. We also introduced the class of examples given by random walk on a finite graph.

- Version: 27 October 2021
- Rmd Source