Introduction

Plagiarism and collusion

Please familarize yourself with the following excerpt on plagiarism and collusion from the student handbook

By ticking the submission declaration box in Moodle you are agreeing to the following declaration:


Declaration: I am aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. I have read the guidelines in the student handbook and understand what constitutes plagiarism. I hereby affirm that the work I am submitting for this in-course assessment is entirely my own.

Anonymous Marking

Please do not write your name anywhere on the submission. Please include only your student number as the proxy identifier.

Questions

1.) Parameter estimation [10 points]

Let \(X_1, X_2, \ldots\) be iid exponential random variables with unknown rate \(\lambda>0\). Suppose we only get to keep track of whether \(X_i\) is in the unit interval or not; that is, we observe the random variables \(Y_i = \mathbf{1}[ 0 \leq X_i \leq 1]\). Find a reasonable estimator for \(\lambda\) and provide justification that your estimator is good.

2.) The area of a random triangle inside a disc [10 points]

Demonstrate by simulations that if three points are sampled independently and uniformly at random inside a disc of radius \(1\), then the expected area of the corresponding triangle is \(35/48\pi\).

3.) Markov chains [10 points]

  • Suppose \(X\) is irreducible and aperiodic finite state Markov chain on a state space \(S\), containing the states \(a,b\) and \(c\). Consider the associated sums given by \[T_n:= \frac{1}{n+1} \sum_{k=0} ^n \mathbf{I}[X_{i+2} =c, X_{i+1}=b =, X_{i} =a].\] Guess the limit for \(T_n\) as \(n\to \infty\). Provide evidence (not necessarily a proof) for your guess. [5 points]

  • Demonstrate your guess, via simulations, using the Markov chain with transition matrix \[ \begin{equation*} P = \begin{pmatrix} 2/6 & 3/6 & 1/6 \\ 1/4 & 1/4 & 1/2 \\ 1/8 & 1/4 & 5/8 \end{pmatrix}. \end{equation*} \] [5 points]

4.) Point processes [10 points]

Consider the point process on the unit interval \([0,1]\) that is a sampled by placing \(N=10\) points independently and uniformly on the interval. Let \(X\) and \(Y\) be the number of points on the two disjoint halves of the interval.

  • Are \(X\) and \(Y\) independent? Explain. [3 points]
  • What is the \(\mathbb{E} X\)? [2 points]
  • What is the probability mass function for \(X\)? [3 points]
  • What is the distribution of location of the point closest to the right endpoint of the interval? [2 points]

Endnotes

Solutions

1.)

We have that \(p:=\mathbb{E}Y_i = 1- e^{-\lambda}\); thus \(\lambda = -\log(1-p)\). Note that the \(Y_i\) are an iid sequence of Bernoulli random variables with parameter \(p\), and the law of large numbers gives that their sample average converges to \(p\), which can be used to give an estimate for \(\lambda\).

2.)

We modify the code from the random triangles worksheet and our code on sampling on a disc.

To sample on a disc we have:

point <- function(){
  z=2
  while(length(z)==1){
  x = 2*runif(1) -1
  y = 2*runif(1) -1
if (x^2 + y^2 <1) { z<-c(x,y)}
  }
  z
}

We compute the area of a random sample using the determinant formulation

area <-function(){
  p1 = point()
  p2 = point()
  p3 = point()
  M = matrix( c( c(p1,1), c(p2,1),c(p3,1)), nrow=3)
  A = 0.5*abs(det(M))
A  
}

Then of course we replicate:

x = replicate(5000, area())
abs(mean(x) - (35 /(48 * pi)) )
## [1] 0.004330415

3.)

From our previous experience and convergence theorems, without loss of generality, we consider the chain started from stationarity. Let \(\pi\) be the stationary distribtion. Taking the expectation we obtain \[\mathbb{E}T_n = \mathbb{P}(X_2=c,X_1=b, X_0=a ) = \pi_a p_{ab}p_{bc}.\] Thus our experience with the law of large numbers for Markov chains suggest this is the limit.

With the given chain, we can simulate it in R, with the code from the course notes, and compute this average over a sample, and compare. We will start than chain at state \(1\), and take \(a=1\), \(b=2\), and \(c=3\).

P <- matrix(c(2/6, 3/6, 1/6, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3)
 P <- t(P)
 P
##           [,1] [,2]      [,3]
## [1,] 0.3333333 0.50 0.1666667
## [2,] 0.2500000 0.25 0.5000000
## [3,] 0.1250000 0.25 0.6250000
 tP = t(P)
 E = eigen(tP)
 EE = E$vectors[,1]
 F= EE /sum(EE)
 F  # stationary distribution
## [1] 0.2054795 0.3013699 0.4931507
 F[1]
## [1] 0.2054795
 step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

steps <- function(n){
  x = 1
  for (i in 1:n){
    x <- c(x, step(x[i]))
  }
  x
}

mc=steps(10000) # sample path of alot of steps

L = length(mc)
L
## [1] 10001
onetwothree=0
for(i in 1:(L-2)){
if (mc[i]==1 && mc[i+1]==2 && mc[i+2]==3 ){
  onetwothree <- onetwothree +1
}
}

Tn = onetwothree/L 
Tn
## [1] 0.05339466
abs( Tn - ( F[1] * (3/6) * (1/2) ) )
## [1] 0.002024798

4.)

End of Solutions