Introduction

Made available UCL Week 16
Due UCL Week 24; 9 February 2022; 11:50 PM, London time.
You are to work in groups of 2-4; there should be no overlap with your ICA2 groups.
This ICA consists of five questions, worth 65 points;
another 10 points will be given based on presentation, so that the total available points is 75.
This ICA is worth 30 percent of your grade for this module.
Many questions will require you to write and run R/Python code.
The ICA must be completed in R Markdown/Juptyer Notebook and typeset using Markdown with Latex code, just like the way our module content is generated. You can choose to use either R or Python.
Please hand in the html file and the Rmd/ipynb source file.
As usual, part of the grading will depend on the clarity and presentation of your solutions.
You are to do this assignment by yourselves, without any help from others.
You are allowed to use any materials and code that was presented so far.
Do not search the internet for answers to the ICA.
Do not use any fancy code or packages imported from elsewhere.

Plagiarism and collusion

Please familarize yourself with the following excerpt on plagiarism and collusion from the student handbook

By ticking the submission declaration box in Moodle you are agreeing to the following declaration:

Declaration: I am aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. I have read the guidelines in the student handbook and understand what constitutes plagiarism. I hereby affirm that the work I am submitting for this in-course assessment is entirely my own.

Anonymous Marking

Please do not write your name anywhere on the submission. Please include only your student number as the proxy identifier.

Questions

1.) [10 points] Apery’s constant

[2 points] Prove (using calculus) that

\[\zeta(3) = \sum_{n=1} ^{\infty} \frac{1}{n^3} < \infty.\]

[6 points] Consider the probability distribution \(\mu\), where \[\mu(n) = \frac{1}{\zeta(3)n^3}.\] Use the Metropolis algorithm to sample from \(\mu\), so that you do not need to know the value of \(\zeta(3)\).
[2 points] Now that you can sample from \(\mu\), use simulations to estimate the value of \(\zeta(3)\).

2.) [15 points] Bayesian statistics and MCMC

We say that a positive continuous random variable \(X\) has the inverse gamma distribution with parameters \(\alpha >0\) and \(\beta >0\) if it has pdf given by \[(y; \alpha, \beta) \mapsto \frac{\beta^{\alpha}}{\Gamma(\alpha)} y^{-\alpha -1} e^{\tfrac{-\beta}{y}} \mathbf{1}[y >0],\] where \(\Gamma\) is the usual Gamma function.

We say that a positive continuous random variable \(W\) has the Scaled-Weibull distribution with shape parameter \(k\) and scale parameter \(\theta >0\) if it has pdf given by \[(w_1; k,\theta) \mapsto \mathbf{1}[w_1 >0]\frac{k w_1^{k-1}}{\theta} \exp[ - \tfrac{w_1^{k}}{\theta } ] .\]

[2 points] Let \({W} = (W_1, \ldots, W_n)\) be a random sample from the Scaled-Weibull distribution with known shape parameter \(k\) and unknown scale parameter \(\theta >0\). Show that \(t({W}) := \sum_{i=1} ^n W_i ^k\) is a sufficient statistic for \(\theta\).
[3 points] Fix \(k >0\). Let \({X} = (X_1, \ldots, X_n)\) be a random sample where the conditional distribution of \(X_1\) given \(\Theta = \theta\) has the Scaled-Weibull distribution with shape parameter \(k\) and scale parameter \(\theta\), and \(\Theta\) has the inverse gamma distribution with parameters \(\alpha\) and \(\beta\). Given sample data \(x=(x_1, x_2, \ldots, x_n)\). Compute the posterior distribution \(s(\theta|t(x))\) up to constant factors.
[3 points] Identify the distribution of \(s(\theta|t(x))\).
[4 points] Now pretend you could not identify it, and could not deduce exact constant factors. For the simple case, where \(\alpha =2\), \(\beta=3\), \(n=3\), and \(x_1=2, x_2=4, x_3=6\), sample from \(s(\theta|t(x))\) using the Metropolis algorithm; also take \(k=1\).
[3 points] Plot independent samples in a probability histogram and compare with the true result.

3.) [15 points] A Poisson process process on a perimeter of a semi-circle

Let \(\Gamma\) be a homogeneous Poisson point process of intensity \(2\) on the upper half of the circle given by \(x^2+y^2 =1\). Here, \(\Gamma\) is not the Gamma function. Consider the point process \(\Upsilon\) given by the projection of \(\Gamma\) onto the \(x\)-axis; that is, if \(\Gamma\) had \(n\) points and they are given by \((x_1, y_1), \ldots, (x_n, y_n)\), then the points of \(\Upsilon\) are just the \(x\)-coordinates \(x_1, \ldots, x_n\).

[5 points] Write code to simulate \(\Gamma\) and \(\Upsilon\). Graphically display a sample realization of these point processes.
[5 points] Demonstrate using simulations that \(\Upsilon\) is not a homogeneous Poisson point process on \([-1,1]\).
[5 points] Show analytically that \(\Upsilon\) cannot be a homogeneous Poisson point process on \([-1,1]\).

4.) [15 points] The transition rate matrix

You are given the the sample data from an irreducible continuous-time Markov chain. The sample data includes the jump times \((0,j_1, \ldots, j_n)\) and states \((s_0, s_1, \ldots, s_n)\); here at time \(j_i\) the Markov chain jumps into state \(s_i\) and stays there until the next jump which occurs at time \(j_{i+1}\).

[8 points] When \(n\) is large, give a method for estimating the transition rate matrix, also referred to as the \(Q\) matrix. Explain why your estimate is reasonable.
[7 points] Import the data from the file Q.txt and use this data and your method above to estimate the \(Q\) matrix.

5.) [10 points] Queues

Suppose you have Poisson arrivals, with intensity \(6\). You are given the following two options. Option 1: we treat it like a \(M(6)/M(8)/1\) system- the items are served by exponentially at rate \(8\). Option 2: each item is painted red or blue independently with probability \(\tfrac{1}{2}\); the coloured items report to different queues, with the red items are served exponentially at rate \(4\), and the blue items served exponentially at rate \(4\).

[5 points] Run simulations to identify the stationary distributions of the items in each of the two options. Which option, on average, has more items in it?
[5 points] Which option is better, from the items/customers perspective? Explain, analytically.

Stat0009 ICA3 (2021)