This document contains a running list of recommended homework exercises.

Using the law of large numbers, carefully justify why a probability histogram of large size sample from a pdf \(f\) ends up

*looking*like \(f\).Let \(T \geq 1\) be a positive integer-valued random variable. Let \(X_1, X_2, \ldots\) be iid random variables, with finite expectation. Let \(S_n = X_1 + \cdots +X_n\). Show that it may

*not*be true that \[\mathbb{E}(S_T) = \mathbb{E}(T) \cdot \mathbb{E}(X_1).\]Show by example that the pairwise independence of random variables is a weaker property than the (full) independence of random variables.

Compute using the law of large numbers and simulations the following integral: \[\int_0 ^ 5 \sin(x) e^x dx;\] hint: identify a random variable that has this integral for its expectation.

Let \(X\) be a continuous random variable, with cdf \(F\). Show that \(F(X)\) is uniformly distributed on \([0,1]\); for simplicity, you may assume that \(F\) is invertible.

Assuming that Python or R only has uniform random variables available, code (in two ways) a geometric random variable with support on the positive integers. Simulate your geometric random variable, and demonstrate by simulations that you have correctely coded it.

Demonstrate, by simulations, that the sum of two independent Poisson random variables is again a Poisson random variable; in your simulations you may restrict to the case of means \(2.5\) and \(3.7\).

How would code a uniform point a disc without throwing away randomness, as we do in the rejection sampling?

How would you code a uniform point on the surface of a sphere?

By simulation reproduce Figure 12

Show that it is indeed enough to consider the Weierstrass theorem on the unit interval.

What about a higher dimensional version of Weierstrass?

Do Question 2

If Markov chains have one-step memory, what happens if we study three-step memory processes? Is the theory the same, why?

Suppose \(X\) and \(Y\) are jointly distributed random variables. Find a coupling such that \(Y' = \phi(X', U)\), where \(U\) is independent of \(X'\) and uniformly distributed on the unit interval.

Do Question 2

You may recall the central limit theorem for iid random variables; explore what happens with Markov chains. In particular, see Question 3

Suppose you are given only the data (the realization) of a Markov chain, say for one hundred steps; how would you go about generating another hundred steps of this Markov chain?

So we have all this theory for Markov chains, where you have to look back at one-step; what happens if you allow two-step memory?

Suppose I prove a law of large numbers type result for Markov chains started at stationarity. How would I extend this proof to any starting distribution?

Find an example where the sum of two

*dependent*Poisson random variables is again a Poisson. Hint: perturb the joint mass function of two independent Poisson random variables.Explore that happens if instead of starting with a Poisson number of iid uniforms, you use other discrete distributions.

Consider iid pertubations of the lattice; that is, for each \(n \in \mathbb{Z}^d\), we perturb it to obtain the perturbed lattice \[\Pi := \{n + X_n: n \in \mathbb{Z}^d\},\] where \((X_n)_{n \in \mathbb{Z}^d}\) are iid \(\mathbb{R}^d\) random variables. Thus \(\Pi\) is a point process– a random scattering points in \(\mathbb{R}^d\). Show that mass is conserved, so that the expected number of points in the set \([0,1)^d\) is (still) one. (Optional)

From the modelling assumptions of Poisson processes, show that if the unit interval contains exactly one point, then that point is uniformly distributed.

- Extend your proof to the case of two or more points.

How would you simulate a uniform on the surface of an ellipse?

More Poisson questions here

Using the law of large numbers, carefully justify why a probability histogram of large size sample from a pdf \(f\) ends up

*looking*like \(f\).Examine Exercise 2 by hand and by simulation.

Examine Exercise Pen and Paper by hand and by simulation.

Let \(X \geq 0\) be a continuous random variable with finite first moment. Prove that \[\mathbb{E} X = \int_0 ^{\infty} \mathbb{P}(X >t) dt.\] Hint: use a double integral.

Let \(X\) and \(Y\) be nonnegative independent continuous random variables. Prove that for \(t >0\), we have \[\mathbb{P}(XY > t) = \int_0 ^{\infty} \mathbb{P}(X >\tfrac{t}{y}) f_Y(y) dy,\] where \(f_Y\) is the probability density function for \(Y\).

Using the previous results prove that \[\mathbb{E}( X Y) = (\mathbb{E} X )(\mathbb{E} Y),\] assuming all the expectations are finite.

Recall that \(X\) has the gamma distribution with parameters \(\alpha, \beta >0\) if it has probability density function given by \[ \begin{equation} f(x; \alpha, \beta)= \frac{ \mathbf{1}_{(0, \infty)}(x)}{\beta^{\alpha} \Gamma(\alpha)} x^{\alpha-1} e^{-x /\beta}. \end{equation} \] Let \(g: [0, \infty) \to \mathbb{R}\) be a bounded continuous function and \(n \in \mathbb{Z}^{+}\). Prove that

\[ \lim_{n \to \infty} \frac{1}{\beta^n \Gamma(n)} \int_ 0^{\infty} g(x/n) x^{n-1} e^{-x /\beta} dx =g(\beta).\]Let \(Z\) be a real-valued random variable. Recall that if \(m\) is the unique point such that \(\mathbb{P}(Z \leq m) = \tfrac{1}{2}\), then it is the

*median*of \(Z\). We say that \(Z\) is*symmetric*about \(m\) if for all \(c \geq 0\), we have \(\mathbb{P}(Z -m \geq c) = \mathbb{P}(Z -m \leq -c)\) . Let \(X = (X_1, \ldots, X_{2n+1})\) be a random sample from a symmetric distribution with unique median zero and order statistics given by \(Y_1 \leq Y_2 \cdots \leq Y_{n+1} \leq \cdots \leq Y_{2n+1}\). The*sample median*is given by \(M(X) = Y_{n+1}\).Show that \(-Y_{n+1}\) has the same distribution as \(Y_{n+1}\). Hint, you can do this without fancy order statistics knowledge.

Assuming the expectations exist, show that \(\mathbb{E} Y_{n+1} =0\).

Let \(\epsilon >0\). Let \(B \sim Bin(2n+1, p)\), where \(p = \mathbb{P}(X_1 > \epsilon)\). Show that \(\mathbb{P}( Y_{n+1} > \epsilon) = \mathbb{P}(B \geq n+1)\).

Show that \(\mathbb{P}(B \geq n+1) \to 0\) as \(n \to \infty\). Deduce that \(M(X) \to 0\) in probability.

Let \(X= (X_1, \ldots,X_{2n+1})\) be a random sample from the normal distribution with unknown mean \(\mu \in \mathbb{R}\) and known unit variance. Use the previous exercises to show that the sample median is unbiased and consistent estimator for \(\mu\). Recall that a (sequence of) estimators is

*consistent*if they converge in probability to the parameter being estimated.

Prove that there exists a deterministic set of positive integers \(S\) such that for

*every*positive integer \(a\), we have \[\frac{ \big| S \cap \{ a, 2a, \ldots, na\} \big| }{n} \to \frac{1}{2}.\] Hint: choose a random subset, and show that there is an event of probability one for which it will satisfy the above requirement. Your final answer should be a deterministic set.Let \(Z\) be a random variable with the standard normal distribution.

Show that for \(t >0\), we have \[\mathbb{P}(Z >t)\leq \frac{1}{t\sqrt{2\pi}} e^{-\tfrac{t^2}{2}}.\]

Show that for \(t >0\), we have \[\mathbb{P}(Z >t) \geq \frac{1}{\sqrt{2\pi}} \big(\frac{1}{t} - \frac{1}{t^3}\big) e^{-\tfrac{t^2}{2}}.\] Hint: take the derivative of the lower bound and see what you get.

Let \((X_i)_{i=3} ^{\infty}\) be an i.i.d. sequence of standard normal variables. Prove that almost surely \[\limsup_{n \to \infty} \frac{X_n}{\sqrt{2 \log n}} =1.\]

Let \(X\) and \(Y\) are real-valued random variables. We say that \(X\)

**stochastically dominates**\(Y\) if for all \(z \in \mathbb{R}\), we have \(\mathbb{P}(X \leq z) \leq \mathbb{P}(Y \leq z)\). Let us say that a coupling \((X', Y')\) of \(X\) and \(Y\) is**monotone**if \(\mathbb{P}(X' \geq Y') =1\).- Use the quantile coupling to show that for real-valued random variables \(X\) and \(Y\) we have that \(X\) stochastically dominates \(Y\) if and only if there exists a monotone coupling of \(X\) and \(Y\).

Let \(0 < q < p <1\). Let \(X \sim Bin(p)\) and \(Y \sim Bin(q)\). Find a coupling \((X', Y')\) of \(X\) and \(Y\) so that \(X' \geq Y'\).

Let \(f:[0,1] \to \mathbb{R}\) be a continuous function. Let \(n\geq 1\). Consider the Bernstein polynomial for \(f\) is defined by \[p(x) = \sum_{k=0}^n f(k/n){n \choose k} x^k(1-x)^{n-k}.\] Show that if \(f\) is an increasing function, then \(p\) is an increasing function.

Let \(X\) and \(Y\) be discrete random variables taking values on the space \(S\), with probability mass functions \(p\) and \(q\). Show that there exists a (

**maximal**) coupling of \((X', Y')\) of \(X\) and \(Y\) such that the equality is achieved in the coupling inequality: \[d_{TV}(X, Y) = d_{TV}(X', Y') = 2 \mathbb{P}(X' \not = Y'),\]Let \(X_1, \ldots, X_n\) be independent integer-valued random variables. Also let \(Y_1, \ldots, Y_n\) be independent integer-valued random variables. Set \(S=X_1 + \cdots + X_n\) and \(W = Y_1 + \cdots + Y_n\). Show that

\[d_{TV}(S, W) \leq \sum_{i=1}^n d_{TV}(X_i, Y_i).\]

- We proved that for an irreducible aperiodic Markov chain \(X\) on a finite number of states \(S\),
*that is started at stationarity*, we have that if \[V_n = \sum_{k=0} ^{n-1} \mathbf{1}[X_k=s],\]

then \(V_n/n \to \pi(s)\) in the mean-squared, where \(\pi\) is the stationary distribution. Show that the assumption that the chain is started at stationarity can be removed.

Prove that for an irreducible Markov chain on a finite state space \(S\), we have that for each \(s \in S\), the return time \[T = \inf \{n\geq 1: X_n=s\}\] has finite expectation, regardless of the starting distribution of the chain.

A

**measure-preserving system**is a probability space \((\Omega, \mathcal{F}, \mu)\) endowed with a self-map \(T: \Omega \to \Omega\), where \(\mu \circ T^{-1} = \mu\). Verify that a Markov chain started at a stationary distribution corresponds to a measure-preserving system. Hint: consider shifting the coordinates of your Markov chain.We that a measure-preserving system is

**ergodic**if every invariant set has measure zero or one; that is, if \(\mu(A \triangle T^{-1}(A)) = 0\), then \(\mu(A) \in \{0,1\}\). Find an example of a stationary Markov chain with non-trivial invariant sets.Show that the

**strong mixing**condition given by

\[\mu(A \cap T^{-n}B) \to \mu(A) \mu(B) \text{ for all } A, B \in \mathcal{F}\]

implies ergodicity. Note by the usual measure theory arguments, verifying the above condition for a large enough class of \(A,B\) is equivalent to verifying the strong mixing condition.

Check that aperiodic irreducible finite state Markov chains, started at the stationary distribution, are strongly mixing. This allows us to obtain limit theorems from the general statements in ergodic theory. Hint: first consider the case corresponding to the events \(A = \{ X_1=s \}\) and \(B =\{ X_3=t\}\).

The von Neumann ergodic theorem gives that for any stationary ergodic process \(X = (X_k)_{k=0} ^{\infty}\) endowed the left-shift, where \((T X)_i = X_{i+1}\), we have the following convergence in the mean-squared

\[ \frac{1}{n} \sum_{k=0} ^{n-1} f( T^k X) \to \mathbb{E} f(X)\]

for all \(f\) such that \(\mathbb{E} [f(X)]^2 < \infty\). Consider the function \(f\) where

\[f(x) = \mathbf{1}[x_{2} = c, x_1 = b, x_0 =a].\] What happens for a Markov chain? Run simulations and check that everything is consistent, for a particular Markov chain.