1 Gambler’s ruin

Recall the Gambler’s ruin problem from the last session. Suppose someone plays the following gambling game. They bet \(1\) pound on a fair flip of a coin and win \(1\) pound if it comes up heads, and lose if come up tails. They start with \(1\) pound, and stop playing once they have reached \(5\) pounds or have no money left. Express this problem up as a Markov chain. In particular:

Specify the state space and transition matrix.
Take high powers of the transition matrix in R; comment on what you see.

2 Anything is possible

Consider the transition matrix \(P\) on the states \(\{1,2,3,4\}\) given by:

P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
P

##           [,1]      [,2] [,3] [,4]      [,5]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000
## [2,] 0.5000000 0.0000000  0.0  0.0 0.5000000
## [3,] 0.5000000 0.0000000  0.0  0.5 0.0000000
## [4,] 0.0000000 0.0000000  0.0  0.5 0.5000000
## [5,] 0.3333333 0.3333333  0.0  0.0 0.3333333

Compute powers of \(P\) until you see that all the entries are positive.
What does it mean if all the entries are positive?

3 Convex combinations

Let \(\lambda\) and \(\mu\) be probability measures on \(\Omega\) with events \(\mathcal{F}\).

Recall that \(\lambda\) is a probability measure if

\(\lambda(\Omega) =1\)
\(\lambda(A ) \in [0,1]\) for all events \(A \in \mathcal{F}\)
\(\lambda( \bigcup_{i=1} ^{\infty} A_i) = \sum_{i=1}^{\infty} \lambda(A_i)\) for any countable number of pairwise disjoint events \(A_1, A_2, \ldots\).

Show that \[\rho = (1-t)\lambda + t \mu\] is a probability measure on \(\Omega\) for all \(t \in [0,1]\).

Let \(P\) be a transition matrix on a state space \(S\). Show that if \(\lambda\) and \(\mu\) are stationary measures of \(P\), then any convex combination of \(\lambda\) and \(\mu\) is a stationary measure for \(P\).
Consider the transition matrix on four states given by \[ \left( \begin{array}{cccc} 1/2 & 1/2 & 0 & 0 \\ 1/2 & 1/2 & 0 & 0 \\ 0 & 0 & 1/3 & 2/3 \\ 0 & 0 & 1/2 & 1/2 \\ \end{array} \right) \]
Find three stationary measures for \(P\).
Does \(P\) have finitely many stationary measures?

4 Periodicity

Consider the transition matrix \(P\) on the states \(\{1,2,3\}\) given by:

P <- matrix( c( 0,0,1,0, 0,0,0,1, 1/2,1/2,0,0, 1,0,0,0), nrow =4)
P <- t(P)
P

##      [,1] [,2] [,3] [,4]
## [1,]  0.0  0.0    1    0
## [2,]  0.0  0.0    0    1
## [3,]  0.5  0.5    0    0
## [4,]  1.0  0.0    0    0

Take powers of \(P\), what do you observe?
Does \(P\) have any stationary distributions? Explain.
Consider the transition matrix \(Q = P^2\). Does \(Q\) have any stationary distributions. Explain.

4.1 Solutions

We see that the matrix blocks alternate positions, and we observe a periodicity; for example, if we start at state \(1\), at time \(0\), we can only return to it at even times.

Q=P%*%P
P%*%P %*% P

##      [,1] [,2] [,3] [,4]
## [1,] 0.00 0.00  0.5  0.5
## [2,] 0.00 0.00  1.0  0.0
## [3,] 0.75 0.25  0.0  0.0
## [4,] 0.50 0.50  0.0  0.0

P%*%P %*% P %*% P

##      [,1] [,2] [,3] [,4]
## [1,] 0.75 0.25 0.00 0.00
## [2,] 0.50 0.50 0.00 0.00
## [3,] 0.00 0.00 0.75 0.25
## [4,] 0.00 0.00 0.50 0.50

P%*%P %*% P %*% P  %*% P

##       [,1]  [,2] [,3] [,4]
## [1,] 0.000 0.000 0.75 0.25
## [2,] 0.000 0.000 0.50 0.50
## [3,] 0.625 0.375 0.00 0.00
## [4,] 0.750 0.250 0.00 0.00

Despite the periodicity, we still have a stationary distribution

tP = t(P)
E=eigen(tP)
E

## eigen() decomposition
## $values
## [1]  1+0.0000000i -1+0.0000000i  0+0.7071068i  0-0.7071068i
## 
## $vectors
##              [,1]          [,2]                  [,3]                  [,4]
## [1,] 0.6324555+0i  0.6324555+0i  0.0000000-0.4082483i  0.0000000+0.4082483i
## [2,] 0.3162278+0i  0.3162278+0i  0.0000000+0.4082483i  0.0000000-0.4082483i
## [3,] 0.6324555+0i -0.6324555+0i -0.5773503+0.0000000i -0.5773503+0.0000000i
## [4,] 0.3162278+0i -0.3162278+0i  0.5773503+0.0000000i  0.5773503+0.0000000i

EE = E$vectors[,1]
EE

## [1] 0.6324555+0i 0.3162278+0i 0.6324555+0i 0.3162278+0i

F= EE /sum(EE)
F

## [1] 0.3333333+0i 0.1666667+0i 0.3333333+0i 0.1666667+0i

F %*% P

##              [,1]         [,2]         [,3]         [,4]
## [1,] 0.3333333+0i 0.1666667+0i 0.3333333+0i 0.1666667+0i

The key difference is that not all the convergence theorems can hold: if we start the Markov chain at \(1\) we cannot expect that \((1,0,0,0) P^n \to \pi\) (in total variation) since when \(n\) is odd, \((1,0,0,0) P^n\) will always give zero mass to the state \(1\), which is nowhere near \(1/3\). We will see later in the module that a version of the law of large numbers still holds. Consider the following simulations, where we demonstrate that the average time that the chain spends in state \(1\) is roughly \(1/3\).

step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

steps <- function(n){
  x = 1
  for (i in 1:n){
    x <- c(x, step(x[i]))
  }
  x
}

z = steps(10000)
mean(z==1)

## [1] 0.3321668

When we consider \(Q = P^2\), in some sense, life is easier, the blocks no longer alternate, and now we are left we two irreducible components, which can be analyzed individually. Each component will have a stationary distribution, and as per the previous exercise, any convex combination of these will also be a stationary distribution. In fact, we see that the previous stationary distribution, for the periodic Markov chain, is a convex combination these stationary distributions.

##      [,1] [,2] [,3] [,4]
## [1,]  0.5  0.5  0.0  0.0
## [2,]  1.0  0.0  0.0  0.0
## [3,]  0.0  0.0  0.5  0.5
## [4,]  0.0  0.0  1.0  0.0

Q %*% Q

##      [,1] [,2] [,3] [,4]
## [1,] 0.75 0.25 0.00 0.00
## [2,] 0.50 0.50 0.00 0.00
## [3,] 0.00 0.00 0.75 0.25
## [4,] 0.00 0.00 0.50 0.50

Q %*% Q %*% Q

##       [,1]  [,2]  [,3]  [,4]
## [1,] 0.625 0.375 0.000 0.000
## [2,] 0.750 0.250 0.000 0.000
## [3,] 0.000 0.000 0.625 0.375
## [4,] 0.000 0.000 0.750 0.250

tQ = t(Q)
E=eigen(tQ)
E

## eigen() decomposition
## $values
## [1]  1.0  1.0 -0.5 -0.5
## 
## $vectors
##           [,1]      [,2]       [,3]       [,4]
## [1,] 0.8944272 0.0000000 -0.7071068  0.0000000
## [2,] 0.4472136 0.0000000  0.7071068  0.0000000
## [3,] 0.0000000 0.8944272  0.0000000 -0.7071068
## [4,] 0.0000000 0.4472136  0.0000000  0.7071068

EE = E$vectors[,1]
EE/sum(EE)

## [1] 0.6666667 0.3333333 0.0000000 0.0000000

5 Maximal coupling

Let \(X\) and \(Y\) be discrete random variables taking values on the space \(S\), with probability mass functions \(p\) and \(q\). Show that there exists a coupling of \((X', Y')\) of \(X\) and \(Y\) such that \[d_{TV}(X, Y) = d_{TV}(X', Y') = 2 \mathbb{P}(X' \not = Y'),\] so that equality is obtained in the usual coupling inequality.

Hint: consider a joint distribution for \(X'\) and \(Y'\), where maximize the probability that \(X'=Y'\) and you spread out the rest of the remaining probability as equally as possible.

5.1 Solutions

Consider the joint distribution (possibly infinite matrix) \(r\) defined as follows. We set \[r_{ii}= t_i:= \min(p_i,q_i).\] Let \[ \theta = \sum_{i\in S} t_i.\] If \(\theta = 1\), then we set all other off-diagonal entries in \(r\) to be zero. Otherwise, set for \(i \not =j\), set \[r_{ij} = \frac{(p_i- t_i) ( q_j - t_j)}{1-\theta}.\]

We easily verify that \(r_{ij} \geq 0\) is a joint distribution for \(X\) and \(Y\). For example, we show that the first marginal is \(p\), the distribution of \(X\).

\[ \begin{eqnarray*} \sum_{j \in S} r_{ij} &=& t_i + \sum_{j \in S \setminus \{i\} } r_{ij} \\ &=& t_i + \frac{(p_i- t_i)}{1-\theta}\sum_{j \in S \setminus \{i\} } (q_j - t_j) \\ &=& t_i + \frac{(p_i- t_i)}{1-\theta}(1-q_i - \theta + t_i ). \end{eqnarray*} \]

If \(p_i = t_i\), then we obtain that

\[\sum_{j \in S} r_{ij} = t_i = p_i= \mathbb{P}(X=i);\] otherwise, \(q_i = t_i\), and we still have

\[\sum_{j \in S} r_{ij} = t_i + p_i - t_i = p_i = \mathbb{P}(X=i).\] Recall that \[2\min(a,b) = a+b - |a-b|.\] Also note that that if \(X'\) and \(Y'\) have joint distribution \(r\), then \(\mathbb{P}(X' = Y') = \theta\). Thus

\[ \begin{eqnarray*} d_{TV}(X',Y') &=& \sum_{i \in S}|p_i-q_i| \\ &=& \sum_{i\in S} [p_i+q_i - 2 \min(p_i, q_i)] \\ &=& 2 - 2\theta \\ &=& 2\mathbb{P}(X' \not = Y'). \end{eqnarray*} \]

6 A triangle inequality

Let \(X_1, \ldots, X_n\) be independent integer-valued random variables. Also let \(Y_1, \ldots, Y_n\) be independent integer-valued random variables. Set \(S=X_1 + \cdots + X_n\) and \(W = Y_1 + \cdots + Y_n\). Show that

\[d_{TV}(S, W) \leq \sum_{i=1}^n d_{TV}(X_i, Y_i).\]

Hint: Use a maximal coupling, along with independence.

6.1 Solutions

Let \(Z_i'=(X_i', Y_i')\) be a maximal coupling of \(X_i\) and \(Y_i\), which exists by the previous exercise. Note that \[ d_{TV}(X_i,Y_i) = 2 \mathbb{P}(X_i' \not = Y_i').\] Furthermore, we require \(Z_1', \ldots, Z_n'\) to be independent; this ensures that \(X_1', \ldots, X'_n\) are independent and \(Y_1', \ldots, Y_n'\) are independent. Hence by assumption that \(X_1, \ldots, X_n\) are independent and \(Y_1, \ldots, Y_n\) are independent, we have that if \[ S' = X_1' + \cdots + X_n'\] and

\[ W' = Y_1' + \cdots + Y_n',\]

then \(S'\) has the same distribution as \(S\) and \(W'\) has the same distribution as \(W\); in other words, \((S',W')\) is a coupling of \(S\) and \(W\). Hence starting with the coupling inequality, we have

\[ \begin{eqnarray*} d_{TV}(S, W) &=& d_{TV}(S', W') \\ &\leq& 2 \mathbb{P}(S' \not = W') \\ &\leq & 2 \mathbb{P}\Big( \bigcup_{i=1}^n \{ X_i' \not = Y_i'\} \Big) \\ &\leq& \sum_{i=1} ^n 2\mathbb{P}(X_i' \not = Y_i') \\ &=& \sum_{i=1} ^n d_{TV}(X_i, Y_i), \end{eqnarray*} \] as desired.

7 Endnotes

Version: 20 October 2021
Rmd Source

In class exercise: Coupling and Transition matrices