A branching process with offspring distribution \(p\) on \(\mathbb{N}\) is a Markov chain \(Z = (Z_0, Z_1, \ldots)\) with state space \(\mathbb{N}\), where \(Z_0=1\) with transitions given by \[\mathbb{P}(Z_{n+1} = t | Z_n =s) = \mathbb{P}(X_1 + \cdots + X_s = t),\] where \(X_i\) are iid random variables with the distribution \(p\). We will always assume that \(p_i \not =1\) and \(p_0 + p_1 <1\), which implies that the generating function \(\hat{p}\) is strictly convex. We are concerned the extinction probability \(q \in [0,1]\). Notice that \(\{ Z_n = 0 \} \subseteq \{ Z_{n+1} = 0 \}\), so that we have \[ \begin{eqnarray*} \mathbb{P}(\lim_{n \to \infty} Z_n =0 ) &=& \mathbb{P}(\bigcup_{n=1}^{\infty} \{ Z_n = 0 \}) \\ &=&\lim_{n \to \infty} \mathbb{P}(Z_n =0) \\ &=& q. \end{eqnarray*} \]
Notice that \(g_{Z_0}(s) = s\) and \(f(s) := g_{Z_1}(s) = \hat{p}(s)\). Set \(f_n(s):= g_{Z_{n}}(s)\). By the random sums lemma, we have that \(f_{n+1}(s) := f_{n}(f(s))\), so that more generally, we have that \(f_n(s)\) is given by a \(n\)-fold composition.
We set \(\mu:=\mathbb{E(Z_1)}\) and \(\sigma^2 := \mathrm{var}(Z_1) \not =0\). Induction gives: \[f_n'(1) = \mu^n = \mathbb{E}(Z_n).\] This can also be obtained via the conditional expectations and induction:
\[\mathbb{E}(Z_{n+1} | Z_n) = \mu Z_n.\]
We also have, for \(\mu \not=1\):
\[\mathrm{var}(Z_n) = \frac{\sigma^2 \mu^n(\mu^n-1)}{\mu^2 - \mu}\]
and for \(\mu =1\):
\[\mathrm{var}(Z_n) = n^2 \sigma^2.\]
Theorem.
Let \(p\) be the offspring distribution for a branching process \(Z\). Let \(\mu = \mathbb{E}(Z_1)\) and let \(q \in [0,1]\) be the extinction probability.
Proof.
Notice that \(\mathbb{P}(Z_n=0) = f_n(0) <1\) for all \(n\), and \(f_n(0)\) is non-decreasing in \(n\) and convergent to \(q\). Since \(f\) is continuous, we have \[q=\lim_{n \to \infty} f_{n+1}(0) = f( \lim_{n \to \infty} f_n(0) ) = f(q).\]
If \(\mu \leq 1\), then \(f'(s)<1\) for \(s \in (0,1)\) and the mean-value theorem gives that \(f(s) > s\), so that we must have \(q=1\).
If \(\mu >1\), then a similar argument gives that \(f(s) < s\), when \(s \in (1-\epsilon, 1)\), for some small \(\epsilon\). Convexity considerations give that \(f(s)=s\) has an unique solution in \([0,1)\); moreover, \(q <1\), since \(f(s) < s\) near \(s=1\)–note that \(f_n(0) <1\), if the limit is \(q=1\), then we must at some point enter the epsilon range where \(f(s)<s\), where the sequence would be decrease.
We consider \(W_n = Z_n/\mu^n\). Notice that with this normalization, we have \[\mathbb{E}(W_{n+1} | W_n) = W_n.\] Some calculations give
\[\mathbb{E}(W_n) = 1+ \frac{\sigma^2}{\mu^2 - \mu} (1- \mu^{-n}).\]
\[\mathbb{E}(W_{n+k} - W_n)^2 = \frac{\sigma^2 \mu^{-n}}{\mu^2 - \mu} (1- \mu^{-k}).\] Thus we see that the sequence \(W_1, W_2, \ldots\) is Cauchy in \(L^2\) and by completeness, the sequence converges to a random variable \(W\) in \(L^2\), with mean \(1\); furthermore, by taking \(k\to \infty\), we see that by the Borel-Cantelli lemma and Markov’s inequality we have almost-sure convergence.