Plagiarism and collusion

Please familarize yourself with the following excerpt on plagiarism and collusion from the student handbook

By ticking the submission declaration box in Moodle you are agreeing to the following declaration:

Declaration: I am aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. I have read the guidelines in the student handbook and understand what constitutes plagiarism. I hereby affirm that the work I am submitting for this in-course assessment is entirely my own.

Anonymous Marking

Please do not write your name anywhere on the submission. Please include only your student number as the proxy identifier.


1.) [10 points] Apery’s constant

  • [2 points] Prove (using calculus) that

\[\zeta(3) = \sum_{n=1} ^{\infty} \frac{1}{n^3} < \infty.\]

  • [6 points] Consider the probability distribution \(\mu\), where \[\mu(n) = \frac{1}{\zeta(3)n^3}.\] Use the Metropolis algorithm to sample from \(\mu\), so that you do not need to know the value of \(\zeta(3)\).

  • [2 points] Now that you can sample from \(\mu\), use simulations to estimate the value of \(\zeta(3)\).


  • The easiest way is to probably invoke the integral test.

  • To run the Metropolis algorithm, we modify the code from a previous in-class session. We use

\[Q(j|i) = Q_{ij} = \tfrac{1}{2} \text{ if and only if } |i-j|=1.\]

This allows to sample from \(\mu\), and thus estimate \(\mu(1)\), and also \(\zeta(3)\).

h <- function(x){
  if(x >0 ) {z = 1/(x^3)   }

Yq <- function(x){
   y= x+ 2*rbinom(1,1,1/2) -1 

a <- function(j,i){
m=min( c( h(j)/h(i), 1 ))  

metro <- function(n){
  for(k in 1:n){
    i = x[length(x)]
    j = Yq(i)
    p = a(j,i)
    x = c(x,i)
      if ( rbinom(1,1,p)==1 ) {
      x[length(x)] <- j

x= replicate(2000, metro(200)[length(metro(200))])
## [1] 1.181335

2.) [15 points] Bayesian statistics and MCMC

We say that a positive continuous random variable \(X\) has the inverse gamma distribution with parameters \(\alpha >0\) and \(\beta >0\) if it has pdf given by \[(y; \alpha, \beta) \mapsto \frac{\beta^{\alpha}}{\Gamma(\alpha)} y^{-\alpha -1} e^{\tfrac{-\beta}{y}} \mathbf{1}[y >0],\] where \(\Gamma\) is the usual Gamma function.

We say that a positive continuous random variable \(W\) has the Scaled-Weibull distribution with shape parameter \(k\) and scale parameter \(\theta >0\) if it has pdf given by \[(w_1; k,\theta) \mapsto \mathbf{1}[w_1 >0]\frac{k w_1^{k-1}}{\theta} \exp[ - \tfrac{w_1^{k}}{\theta } ] .\]

  • [2 points] Let \({W} = (W_1, \ldots, W_n)\) be a random sample from the Scaled-Weibull distribution with known shape parameter \(k\) and unknown scale parameter \(\theta >0\). Show that \(t({W}) := \sum_{i=1} ^n W_i ^k\) is a sufficient statistic for \(\theta\).

  • [3 points] Fix \(k >0\). Let \({X} = (X_1, \ldots, X_n)\) be a random sample where the conditional distribution of \(X_1\) given \(\Theta = \theta\) has the Scaled-Weibull distribution with shape parameter \(k\) and scale parameter \(\theta\), and \(\Theta\) has the inverse gamma distribution with parameters \(\alpha\) and \(\beta\). Given sample data \(x=(x_1, x_2, \ldots, x_n)\). Compute the posterior distribution \(s(\theta|t(x))\) up to constant factors.

  • [3 points] Identify the distribution of \(s(\theta|t(x))\).

  • [4 points] Now pretend you could not identify it, and could not deduce exact constant factors. For the simple case, where \(\alpha =2\), \(\beta=3\), \(n=3\), and \(x_1=2, x_2=4, x_3=6\), sample from \(s(\theta|t(x))\) using the Metropolis algorithm.

  • [3 points] Plot independent samples in a probability histogram and compare with the true result.


  • We have that

\[ \begin{eqnarray*} L({w}; \theta ) &=& \prod_{i=1}^n \frac{k w_i^{k-1}}{\theta} \exp[ - \tfrac{w_i^{k}}{\theta}] \\ &=& \big( \prod_{i=1} ^n kw_i^{k-1} \big) \cdot \big( \frac{1}{\theta^{n}}\exp( -t / \theta) \big), \end{eqnarray*} \]

from which the result follows from the Neyman factorization theorem, where we set \[H({w}) :=\big( \prod_{i=1} ^n kw_i^{k-1} \big)\] and

\[ g(t; \theta) := \frac{1} {\theta^{n}}\exp( -t / \theta).\]

  • Let \(r(\theta)\) be the prior distribution and \(s(\theta | {x})\) denote the posterior. From the previous part, given \(\Theta= \theta\), we know that \(t({X})\) is a sufficient statistic for \(\theta\), thus it suffices to compute \(s(\theta | t)\). We have, using the notation from the previous part that

\[ \begin{eqnarray*} s(\theta | t) &\propto& g(t; \theta) r(\theta)\\ &\propto& \frac{1}{\theta^{n}}\exp( -t / \theta) \cdot \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{-\alpha -1} e^{\tfrac{-\beta}{\theta}} \\ &\propto& \theta^{ -n - \alpha -1 } \exp(-(t+ \beta)/\theta ). \end{eqnarray*} \]

  • We recognize that \(s(\theta|t)\) is the inverse gamma distribution with hyperparameters \(\alpha':= n + \alpha\) and \(\beta':= \beta + t\).

  • To run the Metropolis algorithm, we modify the code from a previous in-class session

h <- function(x){
  if(x >0 ) {z= (x^{-3-2-1}) *exp(-(12+3)/x)   }

Yq <- function(x){
   y= rnorm(1,x,1)

a <- function(j,i){
m=min( c( h(j)/h(i), 1 ))  

metro <- function(n){
  for(k in 1:n){
    i = x[length(x)]
    j = Yq(i)
    p = a(j,i)
    x = c(x,i)
      if ( rbinom(1,1,p)==1 ) {
      x[length(x)] <- j


z = replicate(300, metro(5000)[length(metro(5000))])

hist(z, prob=TRUE, breaks=35)
x= seq(0.1, 20, by =0.2)
curve(dinvgamma(x,shape=5,rate=15), add=TRUE)

3.) [15 points] A Poisson process process on a perimeter of a semi-circle

Let \(\Gamma\) be a homogeneous Poisson point process of intensity \(2\) on the upper half of the circle given by \(x^2+y^2 =1\). Here, \(\Gamma\) is not the Gamma function. Consider the point process \(\Upsilon\) given by the projection of \(\Gamma\) onto the \(x\)-axis; that is, if \(\Gamma\) had \(n\) points and they are given by \((x_1, y_1), \ldots, (x_n, y_n)\), then the points of \(\Upsilon\) are just the \(x\)-coordinates \(x_1, \ldots, x_n\).

  • [5 points] Write code to simulate \(\Gamma\) and \(\Upsilon\). Graphically display a sample realization of these point processes.

  • [5 points] Demonstrate using simulations that \(\Upsilon\) is not a homogeneous Poisson point process on \([-1,1]\).

  • [5 points] Show analytically that \(\Upsilon\) cannot be a homogeneous Poisson point process on \([-1,1]\).


  • We note that the problem is easier in polar coordinates. In order to simulate a uniform point on the semi-circle, it suffices to generate a uniform angle \(\Theta\) on \([0, \pi]\), and then consider the point \((r=1, \Theta)\).
arc <- function(){
  N = rpois(1,2*pi)
  points = NULL
  if (N >0){
    angles = pi*runif(N)
    points = cbind(cos(angles), sin(angles))

x =a[,1]
y =a[,2]
n = length(y)
zeros = rep(0, n)
x = c(x, x)
y = c(y, zeros)

aa = cbind(x,y)

##                 x            y
##  [1,]  0.68082051 7.324503e-01
##  [2,] -0.14791807 9.889996e-01
##  [3,]  0.37798300 9.258125e-01
##  [4,] -0.20829296 9.780665e-01
##  [5,] -0.09655389 9.953278e-01
##  [6,]  0.56355025 8.260818e-01
##  [7,] -0.93003491 3.674712e-01
##  [8,] -1.00000000 8.059216e-05
##  [9,]  0.83526151 5.498529e-01
## [10,]  0.15742677 9.875307e-01
## [11,]  0.68082051 0.000000e+00
## [12,] -0.14791807 0.000000e+00
## [13,]  0.37798300 0.000000e+00
## [14,] -0.20829296 0.000000e+00
## [15,] -0.09655389 0.000000e+00
## [16,]  0.56355025 0.000000e+00
## [17,] -0.93003491 0.000000e+00
## [18,] -1.00000000 0.000000e+00
## [19,]  0.83526151 0.000000e+00
## [20,]  0.15742677 0.000000e+00
curve(sqrt(1-x^2), from=-1, to=1, add=TRUE)

  • We will demonstrate that \(\Upsilon\) likes to have points near the boundaries rather than the center.
counts <-function(a,b){
  x = arc()[,1]
  s=sum(x > a & x <b)

t = replicate(100, counts(-0.1, 0.1))

## [1] 1.31
## [1] 0.35
  • Note that there is a one-to-one correspondence between the points of \(\Gamma\) and \(\Upsilon\). We observe that the number of points of \(\Upsilon\) in \((1/\sqrt{2},1)\) is the number of points of \(\Gamma\) in the angle range \((0, \pi/2)\), which is on average, a \(1/4=0.25\) of the points of \(\Gamma\) or \(\Upsilon\); the interval \((1/\sqrt{2},1)\) has length no greater than \(0.3\), which if \(\Upsilon\) is homogeneous, should only account for no more than \(0.3/2=0.15\) of its points on average.

4.) [15 points] The transition rate matrix

You are given the the sample data from an irreducible continuous-time Markov chain. The sample data includes the jump times \((0,j_1, \ldots, j_n)\) and states \((s_0, s_1, \ldots, s_n)\); here at time \(j_i\) the Markov chain jumps into state \(s_i\) and stays there until the next jump which occurs at time \(j_{i+1}\).

  • [8 points] When \(n\) is large, give a method for estimating the transition rate matrix, also referred to as the \(Q\) matrix. Explain why your estimate is reasonable.

  • [7 points] Import the data from the file Q.txt and use this data and your method above to estimate the \(Q\) matrix.


  • Let \(J\) be the jump chain. We can count the transitions

\[n_{ij} := \sum_{k=0} ^{n-1} \mathbf{1}[J_k=i, J_{k+1}=j],\] and also let

\[n_i = \sum_{k=0} ^{n-1} \mathbf{1}[J_k=i].\]

Then by a version of the law of large numbers \(n_{ij}/n_i \to m_{ij}\), so that we can recover the transition matrix corresponding to the jump chain.

  • We can also estimate the holding times at each state in the following way: for each state \(i\), let \(H^i_k\) the time the Markov chain stays at state \(i\), on its \(k\)-th visit; these are i.i.d. exponential random variables, and thus by the law of large numbers their mean goes to the corresponding mean holding time \(h_i\)

Thus with the relation \(m_{ij}(h_i)^{-1} = q_{ij}\) we can estimate the \(Q\) matrix; furthermore, we can also force the condition \(-\hat{q}_{ii} := \sum_{j\not i} \hat{q}_{ij}\).

  • The following code accomplishes the method outlined above.
imported =read.table("Q.txt", sep=",")

states <- imported[,1]
times <- imported[,2]

m <- function(i,j){
  mij =NULL
for(k in 1:(length(states)-1)){
  if (states[k]==i & states[k+1] ==j ){
    mij = c(mij, k)
mijp = length(mij)/sum(states==i)

tm <-function(i){
  mi = NULL
for(k in 1:length(states)){
  if (states[k]==i){
    mi = c(mi, k)}
for( k in 1: (length(mi)-1) ){
timei = c(timei, times[mi[k]+1] - times[mi[k]]  )

hi1 = 1/mean(tm(1))
hi2 = 1/mean(tm(2))
hi3 = 1/mean(tm(3))

Q <- matrix(c(-hi1,m(1,2)*hi1,m(1,3)*hi1, m(2,1)*hi2,-hi2,m(2,3)*hi2, m(3,1)*hi3, m(3,2)*hi3, -hi3), nrow =3)

Q = t(Q)
##           [,1]      [,2]      [,3]
## [1,] -6.930188  3.952479  2.977709
## [2,]  1.971437 -2.958729  0.986977
## [3,]  1.938296  7.023883 -8.962179

5.) [10 points] Queues

Suppose you have Poisson arrivals, with intensity \(6\). You are given the following two options. Option 1: we treat it like a \(M(6)/M(8)/1\) system- the items are served by exponentially at rate \(8\). Option 2: each item is painted red or blue independently with probability \(\tfrac{1}{2}\); the coloured items report to different queues, with the red items are served exponentially at rate \(4\), and the blue items served exponentially at rate \(4\).

  • [5 points] Run simulations to identify the stationary distributions of the items in each of the two options. Which option, on average, has more items in it?

  • [5 points] Which option is better, from the items/customers perspective? Explain, analytically.


  • In Option 2, by the colouring theorem, the coloured arrival processes are independent Poisson processes each with rate \(3\), thus we are looking at two independent \(M(3)/M(4)/1\) queues, compared to a single \(M(6)/M(8)/1\) in Option 1. Suppose both options have been operating for a long time, so that they are both at stationarity, so there are a geometric number of items in each (coloured) queue. The total time an item spends in a \(M(\lambda)/M(\mu)/1\) queue is exponential with parameter \(\mu - \lambda\); we can also apply Little’s law (\(L = \lambda W)\), since at stationarity, we know \(L = \tfrac{\rho}{1-\rho}\), and obtain that total time has mean \(\tfrac{1}{\mu - \lambda}\). In Option 1, an item spends on average \(\tfrac{1}{2}\) units of time. In Option 2, regardless if you are coloured blue or red, you end up spending an average of \(1\) unit time. Notice that if we applied Little’s law to Option 2, \(L\) is the sum of two independent geometric random times, each with mean \(\tfrac{3/4}{1-3/4} = 3\), thus there are on average \(6\) blue or red items in the store, rather than the \(3\) of Option 1. Option 1 is better.