Introduction

Made available UCL Week 10: 4 November 2022
Due UCL Week 12: 18 November 2022, 13:00 London Time.
This ICA consists of four questions, worth 40 points;
another 5 points will be given based purely on presentation, so that the total available points is 45.
This ICA is worth 30 percent of your grade for this module.
Many questions will require you to write and run R/Python code.
The ICA must be completed in R Markdown/Juptyer Notebook and typeset using Markdown with Latex code, just like the way our module content is generated. You can choose to use either R or Python.
Please hand in the html file and the Rmd/ipynb source file.
As usual, part of the grading will depend on the clarity and presentation of your solutions.
You are to do this assignment by yourselves, without any help from others.
You are allowed to use any materials and code that was presented so far.
Do not search the internet for answers to the ICA.
Do not use any fancy code or packages imported from elsewhere.
- For example, you should not be using a Markov chain package to generate your Markov chains. It is not necessary to use anything other than the basic packages that were already introduced; if you have any doubt, please contact me.

Plagiarism and collusion

Please familarize yourself with the following excerpt on plagiarism and collusion from the student handbook

By ticking the submission declaration box in Moodle you are agreeing to the following declaration:

Declaration: I am aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. I have read the guidelines in the student handbook and understand what constitutes plagiarism. I hereby affirm that the work I am submitting for this in-course assessment is entirely my own.

Anonymous Marking

Please do not write your name anywhere on the submission. Please include only your student number as the proxy identifier.

Questions

1.) Parameter estimation [10 points]

Let \(X_1, X_2, \ldots\) be iid exponential random variables with unknown rate \(\lambda>0\). Suppose we only get to keep track of whether \(X_i\) is in the unit interval or not; that is, we observe the random variables \(Y_i = \mathbf{1}[ 0 \leq X_i \leq 1]\). Find a reasonable estimator for \(\lambda\) and provide justification that your estimator is good.

2.) The area of a random triangle inside a disc [10 points]

Demonstrate by simulations that if three points are sampled independently and uniformly at random inside a disc of radius \(1\), then the expected area of the corresponding triangle is \(35/48\pi\).

3.) Markov chains [10 points]

Suppose \(X\) is irreducible and aperiodic finite state Markov chain on a state space \(S\), containing the states \(a,b\) and \(c\). Consider the associated sums given by \[T_n:= \frac{1}{n+1} \sum_{k=0} ^n \mathbf{I}[X_{i+2} =c, X_{i+1}=b =, X_{i} =a].\] Guess the limit for \(T_n\) as \(n\to \infty\). Provide evidence (not necessarily a proof) for your guess. [5 points]
Demonstrate your guess, via simulations, using the Markov chain with transition matrix \[ \begin{equation*} P = \begin{pmatrix} 2/6 & 3/6 & 1/6 \\ 1/4 & 1/4 & 1/2 \\ 1/8 & 1/4 & 5/8 \end{pmatrix}. \end{equation*} \] [5 points]

4.) Point processes [10 points]

Consider the point process on the unit interval \([0,1]\) that is a sampled by placing \(N=10\) points independently and uniformly on the interval. Let \(X\) and \(Y\) be the number of points on the two disjoint halves of the interval.

Are \(X\) and \(Y\) independent? Explain. [3 points]
What is the \(\mathbb{E} X\)? [2 points]
What is the probability mass function for \(X\)? [3 points]
What is the distribution of location of the point closest to the right endpoint of the interval? [2 points]

Endnotes

With solutions
Version: 16 December 2022
Rmd Source

Solutions

1.)

We have that \(p:=\mathbb{E}Y_i = 1- e^{-\lambda}\); thus \(\lambda = -\log(1-p)\). Note that the \(Y_i\) are an iid sequence of Bernoulli random variables with parameter \(p\), and the law of large numbers gives that their sample average converges to \(p\), which can be used to give an estimate for \(\lambda\).

2.)

We modify the code from the random triangles worksheet and our code on sampling on a disc.

To sample on a disc we have:

point <- function(){
  z=2
  while(length(z)==1){
  x = 2*runif(1) -1
  y = 2*runif(1) -1
if (x^2 + y^2 <1) { z<-c(x,y)}
  }
  z
}

We compute the area of a random sample using the determinant formulation

area <-function(){
  p1 = point()
  p2 = point()
  p3 = point()
  M = matrix( c( c(p1,1), c(p2,1),c(p3,1)), nrow=3)
  A = 0.5*abs(det(M))
A  
}

Then of course we replicate:

x = replicate(5000, area())
abs(mean(x) - (35 /(48 * pi)) )

## [1] 0.004330415

3.)

From our previous experience and convergence theorems, without loss of generality, we consider the chain started from stationarity. Let \(\pi\) be the stationary distribtion. Taking the expectation we obtain \[\mathbb{E}T_n = \mathbb{P}(X_2=c,X_1=b, X_0=a ) = \pi_a p_{ab}p_{bc}.\] Thus our experience with the law of large numbers for Markov chains suggest this is the limit.

With the given chain, we can simulate it in R, with the code from the course notes, and compute this average over a sample, and compare. We will start than chain at state \(1\), and take \(a=1\), \(b=2\), and \(c=3\).

P <- matrix(c(2/6, 3/6, 1/6, 1/4, 1/4,1/2, 1/8, 1/4, 5/8), nrow =3)
 P <- t(P)
 P

##           [,1] [,2]      [,3]
## [1,] 0.3333333 0.50 0.1666667
## [2,] 0.2500000 0.25 0.5000000
## [3,] 0.1250000 0.25 0.6250000

 tP = t(P)
 E = eigen(tP)
 EE = E$vectors[,1]
 F= EE /sum(EE)
 F  # stationary distribution

## [1] 0.2054795 0.3013699 0.4931507

 F[1]

## [1] 0.2054795

 step <- function(i){
  q = P[i,]
  x=-1
  u = runif(1)
  j=0
  cumq = cumsum(q)
  while(x==-1){
    j<-j+1
    if(u <= cumq[j]){x <-j}
  }
  x
}

steps <- function(n){
  x = 1
  for (i in 1:n){
    x <- c(x, step(x[i]))
  }
  x
}

mc=steps(10000) # sample path of alot of steps

L = length(mc)
L

## [1] 10001

onetwothree=0
for(i in 1:(L-2)){
if (mc[i]==1 && mc[i+1]==2 && mc[i+2]==3 ){
  onetwothree <- onetwothree +1
}
}

Tn = onetwothree/L 
Tn

## [1] 0.05339466

abs( Tn - ( F[1] * (3/6) * (1/2) ) )

## [1] 0.002024798

4.)

No, they cannot be independent, since if \(X=0\), then \(Y=10\).
Since each point independently has probability \(1/2\) of being on the ex-side, we must have \(\mathbb{E}X = 10/2 = 5\), by the linearity of expectation.
More specifically, we have a binomial distribution with parameters \(n=10\), and \(p=1/2\).
This is simply the maximum of \(10\) uniforms, an elementary calculation gives the pdf on unit interval given by: \[m \mapsto 10m^9.\]

Stat0009 ICA1 (2022)