Please familarize yourself with the following excerpt on plagiarism and collusion from the student handbook
By ticking the submission declaration box in Moodle you are agreeing to the following declaration:
Declaration: We are aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. We have read the guidelines in the student handbook and understand what constitutes plagiarism. We hereby affirm that the work we are submitting for this in-course assessment is entirely our own.
Please do not write your names anywhere on the submission. Please include only your student numbers as the proxy identifier.
You might recall a proof that when there are only twenty-three people in a room, then with probability almost \(1/2\), the room contains two people with the same birthday, under the assumptions that there are \(365\) days in a year, and birthdays throughout the year are equally distributed, and the birthdays of the occupants of the room are independent.
Under these similar assumptions, let \(p(n)\) be the probability that in a room of \(n\) people that there are three people people that share the same birthday. By running simulations:
Plot an approximate graph of \(p(n)\). [8 points]
Find \(n\) such that \(p(n) \approx 1/2\). [2 points]
This is easily accomplished using the sample function. We also use the sum function to check if a triple match exists and report \(1\) if it does and \(0\) otherwise.
match <-function(n){
x = sample(365,n, replace=TRUE)
N=0
date=0
while(N <3 && date < 366){
date <- date+1
N = sum(x==date)
}
sum(N>2)
}
We now sample with the match function for values of \(n\) from \(1\) to \(388\), and replicate this process for each \(n\) 200 times.
prob= c(0,0)
for ( n in 3:388){
prob <- c(prob, mean(replicate(200, match(n) ) ) )
}
numofpeople = seq(1,388, by=1)
plot(numofpeople, prob)
Now we run a simple search using the fact that \(p(n)\) is monotone, to look for the \(n\) gives \(p(n) \approx 1/2\).
probb=0
argprob=0
while(probb <0.5){
argprob <- argprob +1
probb <- prob[argprob]
}
argprob
## [1] 83
Find or generate a million or so digits of \(\pi\). See for example you may use this link. Here it is okay to use the internet to find basic importation functions or use fancy formulas for the digits of \(\pi\). [2 points]
Process your digits so that they are in groups of \(10\), so that you can pretend each group of \(10\) is realization of a number that is uniformly distributed in \([0,1]\). You might need to take special care of groups beginning with \(0\). [2 points]
Put everything into a single vector, where the first entry contains the number \(0.1415926535\). [2 points]
Plot of histogram to check if they appear uniformly distributed. [2 points]
Find the mean and variance of the data to make sure the data has been imported properly. [2 points]
We import the suggested text file, and have to delete the NA values where there was an empty line.
x=read.delim("pi6.txt", header=FALSE, sep="")
z = c(10^{-10}*x[1,1], 10^{-10}*x[1,2], 10^{-10}*x[1,3], 10^{-10}*x[1,4], 10^{-10}*x[1,5])
for (i in 2:20099){
z <- c(z, 10^{-10}*x[i,1], 10^{-10}*x[i,2], 10^{-10}*x[i,3], 10^{-10}*x[i,4], 10^{-10}*x[i,5])
}
length(z)
## [1] 100495
z[1]
## [1] 0.1415927
z[length(z)]
## [1] 0.5779458
z <-na.omit(z)
length(z)
## [1] 100000
format(z[1], nsmall=10)
## [1] "0.1415926535"
mean(z)
## [1] 0.4984943
var(z)
## [1] 0.08324761
hist(z, prob=TRUE)
Consider the continuous-time Markov chain \(X\) with three state \(\{1,2,3\}\) with \(Q\) matrix given by
Q <- matrix(c(-5,2,3, 2,-3,1, 2,6,-8), nrow =3)
Q = t(Q)
Q
## [,1] [,2] [,3]
## [1,] -5 2 3
## [2,] 2 -3 1
## [3,] 2 6 -8
Use linear algebra and R to find the stationary distribution \(\lambda\). You may need to import a package. [2 points]
Write code, which uses only the randomization provided by the digits of \(\pi\) so that you obtain the state of the chain at time \(t\), starting at a state \(i\). [14 points]
For \(t=35\), and starting at state \(3\), by running a large number of simulation, find the average number of times the Markov chain is in each of the states at time \(t\). You may find that you may need to be more economical, given the number of digits of \(\pi\) that were imported. [2 points]
Discuss your results, as it relates to the theory you learned, and the randomization obtained from \(\pi\). [2 points]
In your coding, you may need a mechanism to keep track of which bits of the randomness from \(\pi\) have already been used; in order to facilitate this, you may need a function to be able to change the value of a variable that exists globally outside definitions of functions; the following example illustrates how to accomplish this:
counterr =0
example <- function(t){
counterr <<- counterr +1
t+5
}
example(1)
## [1] 6
example(10)
## [1] 15
counterr
## [1] 2
We have to load a package to do basic linalg, as was done in a previous homework.
library("pracma")
## Warning: package 'pracma' was built under R version 4.0.3
Y = t(Q)
stat=nullspace(Y)
stat = stat/ sum(stat)
stat
## [,1]
## [1,] 0.2857143
## [2,] 0.5396825
## [3,] 0.1746032
We adapt code from HW5. We delete any used values of the digits of \(\pi\); we also mention that we were not very careful, and wasted some randomness in the clocks portion of the code.
randpi <- z
jump <- function(i){
q = Q[i,]
clocks = c( -log(1- randpi[1] ) / q[1] , -log(1- randpi[2] ) / q[2], -log(1- randpi[3] ) / q[3] )
clocks[i] = Inf
j=which.min(clocks)
time = clocks[j]
randpi <<-randpi[-1]
randpi <<-randpi[-1]
randpi <<-randpi[-1]
c(j,time)
}
runMC <- function(x){
i = x[1]
t = x[2]
totaltime=0
while(totaltime < t){
new = jump(i)
totaltime = new[2] + totaltime
if(totaltime < t ){ i <- new[1] }
}
i
}
We can find an average with \(t=35\), and \(n =200\). We also check how much randomness is left.
y = replicate(200, runMC(c(3,35)))
av= c( sum(y==1)/200, sum(y==2)/200, sum(y==3)/200)
av
## [1] 0.245 0.585 0.170
length(randpi)
## [1] 4792
The convergence theory says that if we used true randomness then the values of the vector av should be close the stationary distribution stat.
av-stat
## [,1]
## [1,] -0.040714286
## [2,] 0.045317460
## [3,] -0.004603175
Hence, we observe that the values of \(\pi\) provide sufficient randomization for the convergence theory of Markov chains, at least in this case.
Consider a \(M(\lambda) / M(\mu) /1\) queue, where \(\lambda < \mu\) and the queue is started at stationarity so that the number of items in the system at any time \(t\), has the same distribution. Consider an arrival at some time \(t\).
Show that the total time it spends in the system is exponential with rate \(\mu - \lambda\), by using the brute force exercise in HW8. [8 points]
Show that this is consistent with Little’s law. [2 points]
We already know from the brute force exercise in HW8 that the time an item waits until finally being serviced is given by \(W\) with cumulative distribution function
\[\mathbb{P}(W \leq x) = 1- \rho e^{-x(\mu - \lambda)},\]
where \(\rho = \lambda/\mu\).
After finally reaching queue, the service time \(S\) is exponential and independent of \(W\), where \(S\) is exponential with rate \(\mu\). Thus it suffice to compute the density for \(Z=W+S\). This is routine, except for the atom at \(W=0\). We have
\[ \mathbb{P}(Z \leq z, W=0) + \mathbb{P}(Z \leq z, W >0)= A(z) + B(z)\] so that the case \(W=0\) can be handled separately.
We have by convolution that \[ \begin{eqnarray*} B'(z) &=& \int_0 ^ z \mu e^{ - \mu x} \rho (\mu- \lambda )e^{-(\mu - \lambda) (z-x)} dx \\ &=& (\mu - \lambda) e^{-z(\mu - \lambda)} \int_0 ^z \lambda e^{-x \lambda} dx \\ &=& (\mu - \lambda) e^{-z(\mu - \lambda)} \big(1- e^{-\lambda z} ) \\ &=& (\mu - \lambda) e^{-z(\mu - \lambda)} - (\mu - \lambda) e^{-z\mu}. \end{eqnarray*} \]
Clearly,
\[ A(z) = \mathbb{P}(S \leq x) \mathbb{P}(W=0) = [1-e^{-z\mu}](1- \rho),\]
so that
\[A'(z) = \mu e^{-z\mu}(1- \rho) = e^{-z\mu}(\mu- \lambda).\]
Hence the desired result follows.
\[W = \frac{\mu^{-1}}{(1- \rho)} = \frac{1}{\mu - \lambda}\] which is exactly the mean of the computed exponential distribution above.
Suppose that customers arrive at a queue with independent exponential service times of rate \(2\), and when they depart, they enter another queue also with independent exponential service times, but at rate \(5\). Assume that the customers enter the first queue arrive as a Poisson process of rate \(1\). Let \(Q_1(t)\) and \(Q_2(t)\) be the number of customers in the respective queues at time \(t\). Verify by simulations that the stationary distribution for \((Q_1, Q_2)\) is given by a product of geometric distributions.
We begin by modifying the code we had in HW 7 that was for a single M/M/1 queue. We note that by time \(t\), if an item has not left the first queue, it will not have entered the second queue.
num <-function(t){
inter = rexp(1,1)
arr = inter
while(t > arr[length(arr)]){
inter <-c(inter, rexp(1,1))
arr <- cumsum(inter)
}
L = length(inter)
service = rexp(L, 2)
output <- arr[1] + service[1]
for (i in 1:(L-1) ) {
if (arr[i+1] < output[i]){output <- c(output, output[i] + service[i+1])}
if (arr[i+1] > output[i]){output <- c(output, arr[i+1] + service[i+1])}
}
output <- output[-L]
arr2 = output[output <t]
L2=length(arr2)
service2 = rexp(L,5)
output2 <- arr2[1] + service2[1]
for (i in 1:(L2-1) ) {
if (arr2[i+1] < output2[i]){output2 <- c(output2, output2[i] + service2[i+1])}
if (arr2[i+1] > output2[i]){output2 <- c(output2, arr2[i+1] + service2[i+1])}
}
n1=sum(output >t)
n2=sum(output2 >t)
c(n1, n2)
}
We check on time \(t=600\), a thousand times.
x= replicate(1000, num(600))
x1 = x[1,]
x2 = x[2,]
b1 = seq(-1,max(x1)+1, by=1)
b2 = seq(-1,max(x2)+1, by=1)
hist(x1, prob=TRUE, breaks=b1)
hist(x2, prob=TRUE, breaks=b2)
We run various tests, marginally.
p1=1-1/2
p2 = 1-1/5
mean(x1) - ( (1-p1)/p1 )
## [1] 0.047
testt1= c(sum(x1==0)/1000 - dgeom(0,p1),sum(x1==1)/1000 - dgeom(1,p1)
,sum(x1==2)/1000 - dgeom(2,p1),sum(x1==3)/1000 - dgeom(3,p1),
sum(x1==4)/1000 - dgeom(4,p1))
mean(x2) - ( (1-p2)/p2 )
## [1] -0.034
testt2= c(sum(x2==0)/1000 - dgeom(0,p2),sum(x2==1)/1000 - dgeom(1,p2)
,sum(x2==2)/1000 - dgeom(2,p2),sum(x2==3)/1000 - dgeom(3,p2),
sum(x2==4)/1000 - dgeom(4,p2))
testt1
## [1] -0.00900 -0.01200 0.01400 0.00250 0.00375
testt2
## [1] 0.02200 -0.01400 -0.00500 -0.00240 -0.00028
Finally, we check the product structure.
productc <-function(j){
a=j[1]
b=j[2]
num=0
for (i in 1:1000){
if (x1[i]==a && x2[i]==b){
num <- num+1}
}
num/1000 - dgeom(a,p1)*dgeom(b,p2)
}
errors = c( productc(c(0,0)),productc(c(0,1)),productc(c(0,2)),productc(c(1,0)),productc(c(2,0)),productc(c(3,0)),
productc(c(2,2)),productc(c(1,1)),productc(c(2,3)),productc(c(2,4)) )
errors
## [1] 5.000000e-03 -8.000000e-03 -5.000000e-03 -4.000000e-03 1.400000e-02
## [6] 2.000000e-03 8.673617e-19 -7.000000e-03 2.000000e-04 8.400000e-04
Let \(X\) be an irreducible and aperiodic matrix on a finite number of states \(S\). For each \(a, s \in S\), let
\[N_{as}^n = \sum_{i=0} ^n \mathbf{1}[X_i=a, X_{i+1} =s].\]
\[T_{n} = \frac{N_{ab} ^n}{ \sum_{s \in S} N_{as}^n }\] gives a sequence of consistent estimators for \(p_{ab}\). [5 points]
Is \(T_n\) unbiased? If not, what additional assumptions are needed to ensure that it is? [2 points]
Suppose that you know all the values of the transition matrix, except the values \(p_{ab}\) and \(p_{ac}\). Also assume you know that the Markov chain has a starting distribution \(\rho\). Find the maximum likelihood estimate of the values \(p_{ab}\) and \(p_{ac}\), given a realization \[X=x=(x_0, x_1, \ldots, x_n).\]
It is permissible to review your previous module notes on mle estimation. [3 points]
We conjectured, but did not prove that \[N_{ab} ^n/n \to \pi_a p_{ab};\] assuming this fact, and using the finite state assumption, we have
\[ \sum_{s \in S} N_{as}^n/n \to \sum_{s \in S} \pi_a p_{as} = \pi_a.\] Hence it follows that the ratio in question converges to \(p_{ab}\).
Not graded, as it is more technical than I intended.
The value \(v := p_{ab} + p_{ac}\) is known by assumption. Thus we are left with a one-dimesional optimization problem; set \(\theta = p_{ab}\), so that \(p_{ac} = v - \theta\). Let \(\rho\) be the starting distribution for \(X\). The likelihood is given by
\[L(\theta; x) = \rho_{x_0} \prod_{i=0} ^{n} p_{x_i x_{i+1}}\]
and
\[\ell(\theta) = \log(\rho_0) + \sum_{i=0} ^n \log(p_{x_i x_{i+1}}).\]
Thus the only terms that are non-constant with respect to \(\theta\) are terms that contain \(p_{ab}=\theta\) and \(p_{ac}=v - \theta\); specifically,
\[\ell'(\theta) = \frac{N_{ab} ^n}{\theta} - \frac{N_{ac} ^n}{v-\theta}.\]
Setting \(\ell'(\theta) =0\), we obtain that the estimates
\[ \hat{p}_{ab} = \frac{vN_{ab}^n}{N_{ab}^n + N_{ac}^n }\]
and
\[ \hat{p}_{ac} = \frac{vN_{ac}^n}{N_{ab}^n + N_{ac}^n }.\]
Note that in the case that both \(N_{ab} = 0 = N_{ac}\), the mle is clearly
\[\hat{p}_{ab}= \hat{p}_{ac}= 0.\]