A coupling of two random variables \(X\) and \(Y\) is a pair of random variables \((X', Y')\) with a joint distribution such that its marginal distributions are given by those of \(X\) and \(Y\). Given two random variables \(X\) and \(Y\), we can’t do simple operations on them such as \(X+Y\), unless we know their joint distribution, that is, if they even have one! If they have a joint distribution, then they live on the same probability space.
Often one can specify a coupling of \(X\) and \(Y\), if we can find a function \(F(X, U)\), where \(U\) is uniformly distributed on \([0,1]\) and independent of \(X\), so that \(F(X, U)\) has the same law as \(Y\). That is we can envision \(Y\) as a function of \(X\) and an additional randomization \(U\).
Independent coupling: this option is always available, but it is often not the most interesting or useful.
Quantile coupling/Inverse transform: If \(F_X\) and \(F_Y\) are cdfs, and \(U\) is uniformly distributed, then \(F_X^{-1}(U)\) and \(F_Y^{-1}(U)\) are random variables with these respective cdfs.
Thinning: If \(X\) is Bernoulli \(p\) and \(Z\) is an independent Bernoulli \(r\), then \(XZ\) is Bernoulli \(pr\). Thus \((X, XZ)\) is a coupling of Bernoulli random variables with parameters \(p\) and \(pr\), and \(X \geq XZ\).
Suppose June plays a gambling game with her mum using \(p=2/3\) coin, betting on heads, winning \(1\) pound if the coin comes up heads, and losing \(1\) pound otherwise. We start with \(0\) pounds and allow the possibility of going negative. June stops playing as soon as she reaches \(5\) pounds. Suppose Tessa plays the same game with her dad using a \(p=1/2\) coin.
We can’t say that June will be better off than Tessa for sure, since the coins are independent. Suppose we want to compare the expectation of the length of time \(W\) it takes for the game to end; it seems obvious that the mean time for June should be less than that of Tessa’s. In fact, the mean time for Tessa is infinite.
We can envision a coupling of these two games, where June is always better off.
Let \(X_i\) be amount that June wins/loses on the \(i\)th flip, and \(Y_i\) is the corresponding amount for Tessa.
We can arrange a coupling of \(X_i\) and \(Y_i\) so that \(X_i' \geq Y_i'\), so that Tessa wins only when June wins. Use thinning!
Under this coupling, we do have that \(W'_J \leq W'_T\), so we can compute \[\mathbb{E} W_J = \mathbb{E} W'_J \leq \mathbb{E} W'_T = \mathbb{E} W_T,\] without knowing too much about \(W\).
Here, we provide some code to illustrates the two coupled games.
We first code and test the coupled coin-flips.
coupled<-function(){
j= rbinom(1,1,2/3)
t= rbinom(1,1,3/4)*j
c(j,t)
}
S=replicate(10000, coupled())
mean(S[1,])
## [1] 0.6596
mean(S[2,])
## [1] 0.4907
Then, we run simulations to see long the game takes on average for June and Tessa.
exit<-function(p){
x=0
n=0
while(x <5){
x <- x+(2*rbinom(1,1,p) -1)
n <- n+1
}
n
}
mean(replicate(1000, exit(2/3)))
## [1] 15.232
mean(replicate(25, exit(1/2)))
## [1] 313.08
We could of run similar simulations without coupling. The point is, we established this result analytically using coupling. The code here is really just to illustrate the coupling. In the last bit of code, we note that in each simulation, under the coupling, Tessa never finishes before June.
exitcoupled<-function(){
june =0
tessa =0
njune=0
ntessa=0
while(june < 5){
v = coupled()
june <- june + 2*v[1] -1
tessa <- tessa + 2*v[2] -1
njune <- 1+njune
ntessa <- 1 + ntessa}
while(tessa <5){
tessa <- tessa+ 2*rbinom(1,1,0.5) -1
ntessa <- 1 + ntessa
}
c(njune, ntessa)
}
replicate(25, exitcoupled())
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 9 5 9 9 11 7 9 23
## [2,] 27 9 13 39 49139 13 67 35
## [,9] [,10] [,11] [,12] [,13] [,14] [,15]
## [1,] 5 11 7 5 7 15 11
## [2,] 43 25 27 187 29 15 37
## [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## [1,] 11 13 5 23 11 31 13
## [2,] 19 21 9 105 53 147 183
## [,23] [,24] [,25]
## [1,] 9 11 11
## [2,] 95 17 47
Let \(X\) and \(Y\) be real-valued random variables. The total variational distance between \(X\) and \(Y\) is given by:
\[d_{TV}(X,Y) = 2 \cdot \sup_{A \text{ is an event}} | \mathbb{P}(X \in A) - \mathbb{P}(Y \in A)|.\] Note that the definition of \(d_{TV}(X,Y)\) only depends on the laws of \(X\) and \(Y\), individually; in particular, if we have that if \((X',Y')\) is a coupling of \(X\) and \(Y\), then \[d_{TV}(X',Y') = d_{TV}(X,Y).\]
Lemma 6.1 Let \(X\) and \(Y\) be integer-valued random variables. We have that \[d_{TV}(X,Y) = \sum_{z \in \mathbb{Z}} | \mathbb{P}(X=z) - \mathbb{P}(Y=z)|.\]
The proof follows from the following simple fact: If \[D = \{z \in \mathbb{Z}: \mathbb{P}(X =z) \geq \mathbb{P}(Y=z)\},\] then
\[\begin{eqnarray*} \sum_{z \in \mathbb{Z}} | \mathbb{P}(X=z) - \mathbb{P}(Y=z)| &=& \sum_{z \in D} (\mathbb{P}(X=z) - \mathbb{P}(Y=z))+\sum_{z \in D^c} (\mathbb{P}(Y=z) - \mathbb{P}(X=z)). \end{eqnarray*}\]
Proof. Observe that for any \(A \subset \mathbb{Z}\), we have \[|\mathbb{P}(X \in A) - \mathbb{P}(Y \in A)| = |\mathbb{P}(X \in A^c) - \mathbb{P}(Y\in A^c)|.\] Thus using the set \(D\), we see that \[d_{TV}(X,Y) \geq \sum_{z \in \mathbb{Z}} | \mathbb{P}(X=z) - \mathbb{P}(Y=z)|.\]
For the other direction, note that
\[\begin{eqnarray*} |\mathbb{P}(X \in A) - \mathbb{P}(Y \in A)| &=& \big|\sum_{z \in A} \mathbb{P}(X =z) - \sum_{z \in A} \mathbb{P}(Y=z) \big| \\ &\leq& \sum_{z \in A} |\mathbb{P}(X=z) - \mathbb{P}(Y=z)|. \end{eqnarray*}\]
Thus it follows that
\[\begin{eqnarray*} 2| \mathbb{P}(X \in A) - \mathbb{P}(Y \in A)| &=& |\mathbb{P}(X \in A) - \mathbb{P}(Y \in A)| + |\mathbb{P}(X \in A^c) - \mathbb{P}(Y \in A^c)| \\ &\leq& \sum_{z \in \mathbb{Z}} |\mathbb{P}(X=z) - \mathbb{P}(Y=z)|. \end{eqnarray*}\]
Exercise 6.1 Let \(X \sim Bern(p)\), \(Y \sim Bern(q)\), and \(W \sim Poi(p)\). Compute \(d_{TV}(X,Y)\) and \(d_{TV}(X, W).\)
Lemma 6.2 If \(X\) and \(Y\) are jointly distributed random variables, then
\[d_{TV}(X,Y) \leq 2\mathbb{P}(X \not = Y).\]
Proof. Let \(A\) be an event. Note that
\[\begin{eqnarray*} |\mathbb{P}(X \in A) - \mathbb{P}(Y \in A)| &=& \Big| \mathbb{P}(X \in A, X = Y) + \mathbb{P}(X \in A, X\not = Y) - \mathbb{P}(Y \in A, X\not = Y)- \mathbb{P}(Y \in A, X = Y) \Big| \\ &=& |\mathbb{P}(X \in A, X \not = Y) - \mathbb{P}(Y \in A, X\not = Y) | \\ &\leq& \mathbb{P}(X \not = Y). \end{eqnarray*}\]
\[\begin{eqnarray*} d_{TV}(S_n, S_{n+1}) &\leq& 2\mathbb{P}(S_n \not = S_{n+1}) \\ &=& 2\mathbb{P}(X_{n+1} = 1) \\ &=& 2p \end{eqnarray*}\]
We saw some basics of how coupling can be used as a tool in probability theory and in particular, how it can be used the bound the total variational distance between two random variables.
For the relations between total variational distance and other related distances on distributions and random variables see Gibbs and Su (2002).
For more information on coupling see Lindvall (1992).