1 Gambler’s ruin

Recall the Gambler’s ruin problem from the last session. Suppose someone plays the following gambling game. They bet \(1\) pound on a fair flip of a coin and win \(1\) pound if it comes up heads, and lose if come up tails. They start with \(1\) pound, and stop playing once they have reached \(5\) pounds or have no money left. Express this problem up as a Markov chain. In particular:

2 Anything is possible

Consider the transition matrix \(P\) on the states \(\{1,2,3,4\}\) given by:

P <- matrix(c(0, 1/2, 1/2, 0, 0, 1/2, 0,0,0,1/2, 1/2,0,0, 1/2, 0, 0,0,0,1/2, 1/2, 1/3, 1/3, 0, 0, 1/3), nrow =5)
P <-t(P)
P
##           [,1]      [,2] [,3] [,4]      [,5]
## [1,] 0.0000000 0.5000000  0.5  0.0 0.0000000
## [2,] 0.5000000 0.0000000  0.0  0.0 0.5000000
## [3,] 0.5000000 0.0000000  0.0  0.5 0.0000000
## [4,] 0.0000000 0.0000000  0.0  0.5 0.5000000
## [5,] 0.3333333 0.3333333  0.0  0.0 0.3333333

3 Convex combinations

Recall that \(\lambda\) is a probability measure if

Show that \[\rho = (1-t)\lambda + t \mu\] is a probability measure on \(\Omega\) for all \(t \in [0,1]\).

4 Periodicity

Consider the transition matrix \(P\) on the states \(\{1,2,3\}\) given by:

P <- matrix( c( 0,0,1,0, 0,0,0,1, 1/2,1/2,0,0, 1,0,0,0), nrow =4)
P <- t(P)
P
##      [,1] [,2] [,3] [,4]
## [1,]  0.0  0.0    1    0
## [2,]  0.0  0.0    0    1
## [3,]  0.5  0.5    0    0
## [4,]  1.0  0.0    0    0

4.1 Solutions

  • We see that the matrix blocks alternate positions, and we observe a periodicity; for example, if we start at state \(1\), at time \(0\), we can only return to it at even times.
Q=P%*%P
P%*%P %*% P
##      [,1] [,2] [,3] [,4]
## [1,] 0.00 0.00  0.5  0.5
## [2,] 0.00 0.00  1.0  0.0
## [3,] 0.75 0.25  0.0  0.0
## [4,] 0.50 0.50  0.0  0.0
P%*%P %*% P %*% P
##      [,1] [,2] [,3] [,4]
## [1,] 0.75 0.25 0.00 0.00
## [2,] 0.50 0.50 0.00 0.00
## [3,] 0.00 0.00 0.75 0.25
## [4,] 0.00 0.00 0.50 0.50
P%*%P %*% P %*% P  %*% P
##       [,1]  [,2] [,3] [,4]
## [1,] 0.000 0.000 0.75 0.25
## [2,] 0.000 0.000 0.50 0.50
## [3,] 0.625 0.375 0.00 0.00
## [4,] 0.750 0.250 0.00 0.00
  • Despite the periodicity, we still have a stationary distribution
tP = t(P)
E=eigen(tP)
E
## eigen() decomposition
## $values
## [1]  1+0.0000000i -1+0.0000000i  0+0.7071068i  0-0.7071068i
## 
## $vectors
##              [,1]          [,2]                  [,3]                  [,4]
## [1,] 0.6324555+0i  0.6324555+0i  0.0000000-0.4082483i  0.0000000+0.4082483i
## [2,] 0.3162278+0i  0.3162278+0i  0.0000000+0.4082483i  0.0000000-0.4082483i
## [3,] 0.6324555+0i -0.6324555+0i -0.5773503+0.0000000i -0.5773503+0.0000000i
## [4,] 0.3162278+0i -0.3162278+0i  0.5773503+0.0000000i  0.5773503+0.0000000i
EE = E$vectors[,1]
EE
## [1] 0.6324555+0i 0.3162278+0i 0.6324555+0i 0.3162278+0i
F= EE /sum(EE)
F
## [1] 0.3333333+0i 0.1666667+0i 0.3333333+0i 0.1666667+0i
F %*% P 
##              [,1]         [,2]         [,3]         [,4]
## [1,] 0.3333333+0i 0.1666667+0i 0.3333333+0i 0.1666667+0i
  • The key difference is that not all the convergence theorems can hold: if we start the Markov chain at \(1\) we cannot expect that \((1,0,0,0) P^n \to \pi\) (in total variation) since when \(n\) is odd, \((1,0,0,0) P^n\) will always give zero mass to the state \(1\), which is nowhere near \(1/3\). We will see later in the module that a version of the law of large numbers still holds. Consider the following simulations, where we demonstrate that the average time that the chain spends in state \(1\) is roughly \(1/3\).
step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

steps <- function(n){
  x = 1
  for (i in 1:n){
    x <- c(x, step(x[i]))
  }
  x
}

z = steps(10000)
mean(z==1)
## [1] 0.3321668
  • When we consider \(Q = P^2\), in some sense, life is easier, the blocks no longer alternate, and now we are left we two irreducible components, which can be analyzed individually. Each component will have a stationary distribution, and as per the previous exercise, any convex combination of these will also be a stationary distribution. In fact, we see that the previous stationary distribution, for the periodic Markov chain, is a convex combination these stationary distributions.
Q
##      [,1] [,2] [,3] [,4]
## [1,]  0.5  0.5  0.0  0.0
## [2,]  1.0  0.0  0.0  0.0
## [3,]  0.0  0.0  0.5  0.5
## [4,]  0.0  0.0  1.0  0.0
Q %*% Q
##      [,1] [,2] [,3] [,4]
## [1,] 0.75 0.25 0.00 0.00
## [2,] 0.50 0.50 0.00 0.00
## [3,] 0.00 0.00 0.75 0.25
## [4,] 0.00 0.00 0.50 0.50
Q %*% Q %*% Q
##       [,1]  [,2]  [,3]  [,4]
## [1,] 0.625 0.375 0.000 0.000
## [2,] 0.750 0.250 0.000 0.000
## [3,] 0.000 0.000 0.625 0.375
## [4,] 0.000 0.000 0.750 0.250
tQ = t(Q)
E=eigen(tQ)
E
## eigen() decomposition
## $values
## [1]  1.0  1.0 -0.5 -0.5
## 
## $vectors
##           [,1]      [,2]       [,3]       [,4]
## [1,] 0.8944272 0.0000000 -0.7071068  0.0000000
## [2,] 0.4472136 0.0000000  0.7071068  0.0000000
## [3,] 0.0000000 0.8944272  0.0000000 -0.7071068
## [4,] 0.0000000 0.4472136  0.0000000  0.7071068
EE = E$vectors[,1]
EE/sum(EE)
## [1] 0.6666667 0.3333333 0.0000000 0.0000000

5 Maximal coupling

Let \(X\) and \(Y\) be discrete random variables taking values on the space \(S\), with probability mass functions \(p\) and \(q\). Show that there exists a coupling of \((X', Y')\) of \(X\) and \(Y\) such that \[d_{TV}(X, Y) = d_{TV}(X', Y') = 2 \mathbb{P}(X' \not = Y'),\] so that equality is obtained in the usual coupling inequality.

Hint: consider a joint distribution for \(X'\) and \(Y'\), where maximize the probability that \(X'=Y'\) and you spread out the rest of the remaining probability as equally as possible.

5.1 Solutions

Consider the joint distribution (possibly infinite matrix) \(r\) defined as follows. We set \[r_{ii}= t_i:= \min(p_i,q_i).\] Let \[ \theta = \sum_{i\in S} t_i.\] If \(\theta = 1\), then we set all other off-diagonal entries in \(r\) to be zero. Otherwise, set for \(i \not =j\), set \[r_{ij} = \frac{(p_i- t_i) ( q_j - t_j)}{1-\theta}.\]

We easily verify that \(r_{ij} \geq 0\) is a joint distribution for \(X\) and \(Y\). For example, we show that the first marginal is \(p\), the distribution of \(X\).

\[ \begin{eqnarray*} \sum_{j \in S} r_{ij} &=& t_i + \sum_{j \in S \setminus \{i\} } r_{ij} \\ &=& t_i + \frac{(p_i- t_i)}{1-\theta}\sum_{j \in S \setminus \{i\} } (q_j - t_j) \\ &=& t_i + \frac{(p_i- t_i)}{1-\theta}(1-q_i - \theta + t_i ). \end{eqnarray*} \]

If \(p_i = t_i\), then we obtain that

\[\sum_{j \in S} r_{ij} = t_i = p_i= \mathbb{P}(X=i);\] otherwise, \(q_i = t_i\), and we still have

\[\sum_{j \in S} r_{ij} = t_i + p_i - t_i = p_i = \mathbb{P}(X=i).\] Recall that \[2\min(a,b) = a+b - |a-b|.\] Also note that that if \(X'\) and \(Y'\) have joint distribution \(r\), then \(\mathbb{P}(X' = Y') = \theta\). Thus

\[ \begin{eqnarray*} d_{TV}(X',Y') &=& \sum_{i \in S}|p_i-q_i| \\ &=& \sum_{i\in S} [p_i+q_i - 2 \min(p_i, q_i)] \\ &=& 2 - 2\theta \\ &=& 2\mathbb{P}(X' \not = Y'). \end{eqnarray*} \]

6 A triangle inequality

Let \(X_1, \ldots, X_n\) be independent integer-valued random variables. Also let \(Y_1, \ldots, Y_n\) be independent integer-valued random variables. Set \(S=X_1 + \cdots + X_n\) and \(W = Y_1 + \cdots + Y_n\). Show that

\[d_{TV}(S, W) \leq \sum_{i=1}^n d_{TV}(X_i, Y_i).\]

Hint: Use a maximal coupling, along with independence.

6.1 Solutions

Let \(Z_i'=(X_i', Y_i')\) be a maximal coupling of \(X_i\) and \(Y_i\), which exists by the previous exercise. Note that \[ d_{TV}(X_i,Y_i) = 2 \mathbb{P}(X_i' \not = Y_i').\] Furthermore, we require \(Z_1', \ldots, Z_n'\) to be independent; this ensures that \(X_1', \ldots, X'_n\) are independent and \(Y_1', \ldots, Y_n'\) are independent. Hence by assumption that \(X_1, \ldots, X_n\) are independent and \(Y_1, \ldots, Y_n\) are independent, we have that if \[ S' = X_1' + \cdots + X_n'\] and

\[ W' = Y_1' + \cdots + Y_n',\]

then \(S'\) has the same distribution as \(S\) and \(W'\) has the same distribution as \(W\); in other words, \((S',W')\) is a coupling of \(S\) and \(W\). Hence starting with the coupling inequality, we have

\[ \begin{eqnarray*} d_{TV}(S, W) &=& d_{TV}(S', W') \\ &\leq& 2 \mathbb{P}(S' \not = W') \\ &\leq & 2 \mathbb{P}\Big( \bigcup_{i=1}^n \{ X_i' \not = Y_i'\} \Big) \\ &\leq& \sum_{i=1} ^n 2\mathbb{P}(X_i' \not = Y_i') \\ &=& \sum_{i=1} ^n d_{TV}(X_i, Y_i), \end{eqnarray*} \] as desired.

7 Endnotes