- We will review some basics Markov chains.
- Focus will be on the case where there are a finite number of states.
- We will plot a course towards a probabilistic analysis of the convergence to the stationary distribution via Doeblin’s coupling.
Let \(X=(X_0, X_1, \ldots)\) be random variables taking values in a finite or countable set \(S\); which we call the state space.
Sometimes we will take \(S = \mathbb{Z}\) or \(S=\mathbb{N}\) or \(S= \{1, \ldots, N\}\), or \(S=\{s_1, \ldots, s_N\}\).
We say that \(X\) is a (homogeneous) Markov chain if for all \(n \geq 0\), we have \[ \begin{eqnarray*} && \mathbb{P}(X_{n+1} =j \ | \ X_n=i, X_{n-1}=a_{n-1}, \ldots, X_0 = a_{0}) \\ &=& \mathbb{P}(X_{n+1} = j \ | \ X_n = i) \\ &=& \mathbb{P} (X_1 = j \ | \ X_0 =i), \end{eqnarray*} \] for all \(i,j, a_{n-1}, \ldots, a_{0} \in S\); which we sometimes refer to as the Markov property.
We call the matrix \(P\) given by entries \[P_{ij} = p_{ij} = \mathbb{P} (X_1 = j \ | \ X_0 =i)\] the transition matrix for the chain \(X\).
Conversely, we also call \(P\) a transition matrix on \(S\) if for all \(i \in S\), we have \(\sum_{j \in S} p_{ij} =1\).
The law of \(X_0\) is sometimes called the initial distribution.
Notice that if the law of \(X_0\) is expressed as a row vector \(\lambda = [\lambda_1, \lambda_2, \ldots]\), then \(\lambda P\) gives the law of \(X_1\) expressed as a row vector, which we also call a probability vector.
If \(\mathbb{P} (X_0 = i) = 1\), then the chain \(X\) is started at the state \(i\).
If \(P\) is a transition matrix, one may think of \(P^n\) as the matrix with entries \(p_{ij}(n)\) which gives the number of weighted paths of length \(n\) from \(i\) to \(j\). To help us understand this, we consider the following exercise.
Let \(G=(V,E)\) be a finite directed graph with vertices \(V\) and edges \(E\), here we allow the possibility of multiple edges and loops. Let \(A\) be the adjacency matrix for \(G\) so that \[A_{ij}=\text{`the number of edges from $i$ to $j$'},\]
for each \(i,j \in V\). Recall that a path of length \(n\) from \(i\) to \(j\) is given by a sequence of vertices \(i=v_1, v_2, \ldots, v_{n-1},v_{n}=j\) such that \((v_i, v_{i+1}) \in E\). Show that the number for paths of length \(n\) from \(i\) to \(j\) is given by \(A^n_{ij}\) that is the \((i,j)\) entry in the matrix \(A^n\).
\[ \begin{eqnarray*} \rho_{ij}(n+1) &=& \sum_{k \in V} \rho_{ik}(n)\rho_{jk}(1) \\ &=&\sum_{k \in V} A^n_{ik}A_{kj} = A^{n+1}_{ij}. \end{eqnarray*} \]
Let \(P\) be the transition matrix for a Markox chain \(X\). Consider the matrix \(P^n\), where we denote its entries via \(P^n_{ij} = p_{ij}(n).\)
Then \[ \begin{eqnarray*} && \mathbb{P} (X_{n} =j | X_0=i) \\ &=& \sum_k \mathbb{P} (X_{n} =j, X_{n-1} = k | X_0=i) \\ &=& \sum_k \mathbb{P} (X_{n} =j | X_{n-1} = k, X_0=i) \mathbb{P} (X_{n-1} = k| X_0=i) \end{eqnarray*} \] It follows from the Markov property that \[ \mathbb{P} (X_{n} =j | X_{n-1} = k, X_0=i) = \mathbb{P} (X_{n} =j | X_{n-1} = k) = p_{kj}\] and the inductive hypothesis gives that \[ \mathbb{P} (X_{n-1} = k| X_0=i) = p_{ik}(n-1).\]
Thus from the definition of matrix multiplication, we have \[ \mathbb{P} (X_{n} =j | X_0=i)= p_{ij}(n) = \sum_k p_{ik} p_{kj}(n-1)\] as desired.
Finally if \(X_0\) has law \(\lambda\) given as a row vector, then
\[
\begin{eqnarray*}
&& \mathbb{P} (X_n = j) \\
&=& \sum_k \mathbb{P} (X_n = j | X_0 = k) \mathbb{P} (X_0 = k) \\
&=& \sum_k \lambda_k p_{kj}(n) \\
&=& (\lambda P^n)_j.
\end{eqnarray*}
\]
Let \(P\) be the transition matrix on three states \(\{1,2,3\}\) given by \[ \begin{equation*} P = \begin{pmatrix} 1/3 & 1/3 & 1/3 \\ 1/4 & 1/4 & 1/2 \\ 1/8 & 1/4 & 5/8 \end{pmatrix} \end{equation*} \] Suppose that \(X\) is a Markov chain, and suppose \(X_0\) has law \((1/3, 1/3, 1/3)\).
We use R of course. The computations are easy to do in R. From the form of \(P^n\), for large \(n\) it will be obvious that the initial distribution is forgotten.
P <- matrix(c(1/3, 1/3, 1/3, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3) P <- t(P) lambda = c(1/3, 1/3, 1/3) Q = P %*% P %*% P %*% P %*% P x=lambda %*% Q
x
## [,1] [,2] [,3] ## [1,] 0.2001395 0.2667132 0.5331473
x[3]
## [1] 0.5331473
M = Q for (i in 1:100){ M <-M %*% Q }
M
## [,1] [,2] [,3] ## [1,] 0.2 0.2666667 0.5333333 ## [2,] 0.2 0.2666667 0.5333333 ## [3,] 0.2 0.2666667 0.5333333
Again, let \(P\) be the transition matrix on three states \(\{1,2,3\}\) given by \[ \begin{equation*} P = \begin{pmatrix} 1/3 & 1/3 & 1/3 \\ 1/4 & 1/4 & 1/2 \\ 1/8 & 1/4 & 5/8 \end{pmatrix} \end{equation*} \] Find any stationary distributions for \(P\). What do you see?
Again we use R. We need to do some normalizing, as linear algebra does not know we are doing statistics. We will see the same repeated vector that appeared in \(P^n\) from the previous part.
P <- matrix(c(1/3, 1/3, 1/3, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3) P <- t(P) lambda = c(1/3, 1/3, 1/3) tP = t(P)
E=eigen(tP) E
## eigen() decomposition ## $values ## [1] 1.00000000 0.25000000 -0.04166667 ## ## $vectors ## [,1] [,2] [,3] ## [1,] -0.3179994 -0.5883484 0.4082483 ## [2,] -0.4239992 -0.1961161 -0.8164966 ## [3,] -0.8479983 0.7844645 0.4082483
EE = E$vectors[,1]
EE
## [1] -0.3179994 -0.4239992 -0.8479983
F= EE /sum(EE) F
## [1] 0.2000000 0.2666667 0.5333333
Let \(G= (V, E)\) be a simple undirected graph: no loops, no multiple edges. Consider the random walk \(S_n\) with transition matrix given by: \[\mathbb{P} (S_{n+1} = j \ | \ S_i = i) = \frac{1}{\deg(i)}\mathbf{1}[(i,j) \in E] = p_{ij}\] for \(i, j \in V\). Here recall that \(\deg(i) = \sum_{j \in V} \mathbf{1}[(i,j) \in E].\) Show that if \(|E| < \infty\), then \(\pi_i = \deg(i) / 2|E|\) is a probability measure on \(V\), and \(\pi = \pi P\).