Introduction

Plagiarism and collusion

Please familarize yourself with the following excerpt on plagiarism and collusion from the student handbook

By ticking the submission declaration box in Moodle you are agreeing to the following declaration:


Declaration: We are aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. We have read the guidelines in the student handbook and understand what constitutes plagiarism. We hereby affirm that the work we are submitting for this in-course assessment is entirely our own.

Anonymous Marking

Please do not write your names anywhere on the submission. Please include only your student numbers as the proxy identifier.

1.) It’s your birthday. [10 points]

You might recall a proof that when there are only twenty-three people in a room, then with probability almost \(1/2\), the room contains two people with the same birthday, under the assumptions that there are \(365\) days in a year, and birthdays throughout the year are equally distributed, and the birthdays of the occupants of the room are independent.

Under these similar assumptions, let \(p(n)\) be the probability that in a room of \(n\) people that there are three people people that share the same birthday. By running simulations:

Solution

This is easily accomplished using the sample function. We also use the sum function to check if a triple match exists and report \(1\) if it does and \(0\) otherwise.

match <-function(n){
  x = sample(365,n, replace=TRUE)
  N=0
  date=0
  while(N <3 && date < 366){
    date <- date+1
    N = sum(x==date)
  }
  sum(N>2)
  }

We now sample with the match function for values of \(n\) from \(1\) to \(388\), and replicate this process for each \(n\) 200 times.

prob= c(0,0)
  for ( n in 3:388){
    prob <- c(prob, mean(replicate(200, match(n) ) ) )
  }
numofpeople = seq(1,388, by=1)
plot(numofpeople, prob)

Now we run a simple search using the fact that \(p(n)\) is monotone, to look for the \(n\) gives \(p(n) \approx 1/2\).

probb=0
argprob=0
while(probb <0.5){
  argprob <- argprob +1
  probb <- prob[argprob]
}

argprob
## [1] 83

2.) Pretending \(\pi\) has random digits and data processing. [10 points]

Solution

We import the suggested text file, and have to delete the NA values where there was an empty line.

x=read.delim("pi6.txt", header=FALSE, sep="")
z = c(10^{-10}*x[1,1], 10^{-10}*x[1,2], 10^{-10}*x[1,3], 10^{-10}*x[1,4], 10^{-10}*x[1,5])
for (i in 2:20099){
     z <- c(z, 10^{-10}*x[i,1], 10^{-10}*x[i,2], 10^{-10}*x[i,3], 10^{-10}*x[i,4], 10^{-10}*x[i,5])
}
length(z)
## [1] 100495
z[1]
## [1] 0.1415927
z[length(z)]
## [1] 0.5779458
z <-na.omit(z)
length(z)
## [1] 100000
format(z[1], nsmall=10)
## [1] "0.1415926535"
mean(z)
## [1] 0.4984943
var(z)
## [1] 0.08324761
hist(z, prob=TRUE)


3.) Using \(\pi\) for continuous-time Markov chains. [20 points]

Consider the continuous-time Markov chain \(X\) with three state \(\{1,2,3\}\) with \(Q\) matrix given by

Q <- matrix(c(-5,2,3, 2,-3,1, 2,6,-8), nrow =3)
Q = t(Q)
Q
##      [,1] [,2] [,3]
## [1,]   -5    2    3
## [2,]    2   -3    1
## [3,]    2    6   -8

In your coding, you may need a mechanism to keep track of which bits of the randomness from \(\pi\) have already been used; in order to facilitate this, you may need a function to be able to change the value of a variable that exists globally outside definitions of functions; the following example illustrates how to accomplish this:

counterr =0

example <- function(t){
counterr <<- counterr +1

t+5
}

example(1)
## [1] 6
example(10)
## [1] 15
counterr
## [1] 2

Solution

We have to load a package to do basic linalg, as was done in a previous homework.

library("pracma")
## Warning: package 'pracma' was built under R version 4.0.3
Y = t(Q)
stat=nullspace(Y)
stat = stat/ sum(stat)
stat
##           [,1]
## [1,] 0.2857143
## [2,] 0.5396825
## [3,] 0.1746032

We adapt code from HW5. We delete any used values of the digits of \(\pi\); we also mention that we were not very careful, and wasted some randomness in the clocks portion of the code.

randpi <- z


jump <- function(i){
  q = Q[i,]
  clocks = c( -log(1- randpi[1] ) / q[1]    , -log(1- randpi[2] ) / q[2], -log(1- randpi[3] ) / q[3]  )
clocks[i] = Inf
j=which.min(clocks)
time = clocks[j]
randpi <<-randpi[-1]
randpi <<-randpi[-1]
randpi <<-randpi[-1]
c(j,time)
}
runMC <- function(x){
  i = x[1]
  t = x[2]
  totaltime=0
  while(totaltime < t){
    new = jump(i)
  totaltime = new[2] + totaltime
  if(totaltime < t ){ i <- new[1] }
  }
i
}

We can find an average with \(t=35\), and \(n =200\). We also check how much randomness is left.

y = replicate(200, runMC(c(3,35)))
av= c( sum(y==1)/200, sum(y==2)/200, sum(y==3)/200)
av
## [1] 0.245 0.585 0.170
length(randpi)
## [1] 4792

The convergence theory says that if we used true randomness then the values of the vector av should be close the stationary distribution stat.

av-stat
##              [,1]
## [1,] -0.040714286
## [2,]  0.045317460
## [3,] -0.004603175

Hence, we observe that the values of \(\pi\) provide sufficient randomization for the convergence theory of Markov chains, at least in this case.

4.) Response time in M/M/1 queue. [10 points]

Consider a \(M(\lambda) / M(\mu) /1\) queue, where \(\lambda < \mu\) and the queue is started at stationarity so that the number of items in the system at any time \(t\), has the same distribution. Consider an arrival at some time \(t\).

Solution

We already know from the brute force exercise in HW8 that the time an item waits until finally being serviced is given by \(W\) with cumulative distribution function

\[\mathbb{P}(W \leq x) = 1- \rho e^{-x(\mu - \lambda)},\]

where \(\rho = \lambda/\mu\).

After finally reaching queue, the service time \(S\) is exponential and independent of \(W\), where \(S\) is exponential with rate \(\mu\). Thus it suffice to compute the density for \(Z=W+S\). This is routine, except for the atom at \(W=0\). We have

\[ \mathbb{P}(Z \leq z, W=0) + \mathbb{P}(Z \leq z, W >0)= A(z) + B(z)\] so that the case \(W=0\) can be handled separately.

We have by convolution that \[ \begin{eqnarray*} B'(z) &=& \int_0 ^ z \mu e^{ - \mu x} \rho (\mu- \lambda )e^{-(\mu - \lambda) (z-x)} dx \\ &=& (\mu - \lambda) e^{-z(\mu - \lambda)} \int_0 ^z \lambda e^{-x \lambda} dx \\ &=& (\mu - \lambda) e^{-z(\mu - \lambda)} \big(1- e^{-\lambda z} ) \\ &=& (\mu - \lambda) e^{-z(\mu - \lambda)} - (\mu - \lambda) e^{-z\mu}. \end{eqnarray*} \]

Clearly,

\[ A(z) = \mathbb{P}(S \leq x) \mathbb{P}(W=0) = [1-e^{-z\mu}](1- \rho),\]

so that

\[A'(z) = \mu e^{-z\mu}(1- \rho) = e^{-z\mu}(\mu- \lambda).\]

Hence the desired result follows.

  • Little’s law tells us that \(L = \lambda W\), where \(\lambda\), where \(L\) is the long term average number of items in the system, \(W\) is the long term average waiting time (sorry for the clash of notation), and \(\lambda\) is the long term average arrival rate. We know that \(\lambda = \lambda\), and we know number of people in the queue is geometric so that the mean is \(\rho/(1-\rho)\), thus

\[W = \frac{\mu^{-1}}{(1- \rho)} = \frac{1}{\mu - \lambda}\] which is exactly the mean of the computed exponential distribution above.

5.) Queues in series. [10 points]

Suppose that customers arrive at a queue with independent exponential service times of rate \(2\), and when they depart, they enter another queue also with independent exponential service times, but at rate \(5\). Assume that the customers enter the first queue arrive as a Poisson process of rate \(1\). Let \(Q_1(t)\) and \(Q_2(t)\) be the number of customers in the respective queues at time \(t\). Verify by simulations that the stationary distribution for \((Q_1, Q_2)\) is given by a product of geometric distributions.

Solution

We begin by modifying the code we had in HW 7 that was for a single M/M/1 queue. We note that by time \(t\), if an item has not left the first queue, it will not have entered the second queue.

num <-function(t){
  inter = rexp(1,1)
  arr = inter
  

  
  while(t > arr[length(arr)]){
    inter <-c(inter, rexp(1,1))
    arr <- cumsum(inter)
}
  
  L = length(inter)
  service = rexp(L, 2)
  
    
 output <- arr[1] + service[1]
for (i in 1:(L-1) ) {
if (arr[i+1] < output[i]){output <- c(output, output[i] + service[i+1])}
if (arr[i+1] > output[i]){output <- c(output, arr[i+1] + service[i+1])}
}   
output <- output[-L]



arr2 = output[output <t]
L2=length(arr2)
service2 = rexp(L,5)

output2 <- arr2[1] + service2[1]

for (i in 1:(L2-1) ) {
if (arr2[i+1] < output2[i]){output2 <- c(output2, output2[i] + service2[i+1])}
if (arr2[i+1] > output2[i]){output2 <- c(output2, arr2[i+1] + service2[i+1])}
}   





n1=sum(output >t)
n2=sum(output2 >t)

c(n1, n2)
}

We check on time \(t=600\), a thousand times.

x= replicate(1000, num(600))
x1 = x[1,]
x2 = x[2,]
b1 = seq(-1,max(x1)+1, by=1)
b2 = seq(-1,max(x2)+1, by=1)
hist(x1, prob=TRUE, breaks=b1)

hist(x2, prob=TRUE, breaks=b2)

We run various tests, marginally.

p1=1-1/2
p2 = 1-1/5

mean(x1) - (  (1-p1)/p1 )
## [1] 0.047
testt1= c(sum(x1==0)/1000 - dgeom(0,p1),sum(x1==1)/1000 - dgeom(1,p1)
         ,sum(x1==2)/1000 - dgeom(2,p1),sum(x1==3)/1000 - dgeom(3,p1),
sum(x1==4)/1000 - dgeom(4,p1))

mean(x2) - (  (1-p2)/p2 )
## [1] -0.034
testt2= c(sum(x2==0)/1000 - dgeom(0,p2),sum(x2==1)/1000 - dgeom(1,p2)
         ,sum(x2==2)/1000 - dgeom(2,p2),sum(x2==3)/1000 - dgeom(3,p2),
sum(x2==4)/1000 - dgeom(4,p2))


testt1
## [1] -0.00900 -0.01200  0.01400  0.00250  0.00375
testt2
## [1]  0.02200 -0.01400 -0.00500 -0.00240 -0.00028

Finally, we check the product structure.

productc <-function(j){
  a=j[1]
  b=j[2]
  
num=0
for (i in 1:1000){
if (x1[i]==a && x2[i]==b){
num <- num+1}
}

num/1000 - dgeom(a,p1)*dgeom(b,p2)
  }

errors = c( productc(c(0,0)),productc(c(0,1)),productc(c(0,2)),productc(c(1,0)),productc(c(2,0)),productc(c(3,0)),
productc(c(2,2)),productc(c(1,1)),productc(c(2,3)),productc(c(2,4)) )

errors
##  [1]  5.000000e-03 -8.000000e-03 -5.000000e-03 -4.000000e-03  1.400000e-02
##  [6]  2.000000e-03  8.673617e-19 -7.000000e-03  2.000000e-04  8.400000e-04

6.) Estimating the transition matrix for a Markov chain. [10 points]

Let \(X\) be an irreducible and aperiodic matrix on a finite number of states \(S\). For each \(a, s \in S\), let

\[N_{as}^n = \sum_{i=0} ^n \mathbf{1}[X_i=a, X_{i+1} =s].\]

\[T_{n} = \frac{N_{ab} ^n}{ \sum_{s \in S} N_{as}^n }\] gives a sequence of consistent estimators for \(p_{ab}\). [5 points]

Solution

  • We conjectured, but did not prove that \[N_{ab} ^n/n \to \pi_a p_{ab};\] assuming this fact, and using the finite state assumption, we have
    \[ \sum_{s \in S} N_{as}^n/n \to \sum_{s \in S} \pi_a p_{as} = \pi_a.\] Hence it follows that the ratio in question converges to \(p_{ab}\).

  • Not graded, as it is more technical than I intended.

  • The value \(v := p_{ab} + p_{ac}\) is known by assumption. Thus we are left with a one-dimesional optimization problem; set \(\theta = p_{ab}\), so that \(p_{ac} = v - \theta\). Let \(\rho\) be the starting distribution for \(X\). The likelihood is given by

\[L(\theta; x) = \rho_{x_0} \prod_{i=0} ^{n} p_{x_i x_{i+1}}\]

and

\[\ell(\theta) = \log(\rho_0) + \sum_{i=0} ^n \log(p_{x_i x_{i+1}}).\]

Thus the only terms that are non-constant with respect to \(\theta\) are terms that contain \(p_{ab}=\theta\) and \(p_{ac}=v - \theta\); specifically,

\[\ell'(\theta) = \frac{N_{ab} ^n}{\theta} - \frac{N_{ac} ^n}{v-\theta}.\]

Setting \(\ell'(\theta) =0\), we obtain that the estimates

\[ \hat{p}_{ab} = \frac{vN_{ab}^n}{N_{ab}^n + N_{ac}^n }\]

and

\[ \hat{p}_{ac} = \frac{vN_{ac}^n}{N_{ab}^n + N_{ac}^n }.\]

Note that in the case that both \(N_{ab} = 0 = N_{ac}\), the mle is clearly

\[\hat{p}_{ab}= \hat{p}_{ac}= 0.\]

Version: 15 March 2021