In this worksheet, we will consider the output of a Markov chain that is stored on a textfile. We will import this data, and do basics statistics to estimate the stationary distribution of corresponding Markov chain.
Suppose you are given the output of a \(100000\) steps of a irreducible and aperiodic finite state Markov chain. Carefully explain how you could estimate the stationary distribution for this Markov chain, and why you estimator is reasonable.
Import the data from the file markovchain.txt and use this data and your method above to estimate the stationary distribution.
Any ideas for the transition matrix?
\[\frac{1}{n} \sum_{i=1}^n \mathbf{1}[X_i = s] \to \pi(s).\] Thus for each state, we simply need to count the number of occurrences and divide by \(100000\), to get its approximate probability under \(\pi\).
z =read.table("markovchain.txt", sep=",")
z= z[,]
b1 = seq(-1,5, by=1)
hist(z, prob=TRUE, breaks=b1)
sum(z==1)/100000
## [1] 0.17023
sum(z==2)/100000
## [1] 0.24957
sum(z==3)/100000
## [1] 0.41172
sum(z==4)/100000
## [1] 0.16849
Next we show how this code might be done in Python; you may need to install the package pandas; if you are running this inside R, you may need to put py_install(“pandas”) where you load the library(reticulate).
import pandas as pd
import numpy as np
mydata = pd.read_csv("https://tsoo-math.github.io/ucl3/markovchain.txt", sep =",")
print(mydata)
## x
## 1 1
## 2 2
## 3 4
## 4 3
## 5 4
## ... ..
## 99997 3
## 99998 3
## 99999 2
## 100000 4
## 100001 3
##
## [100001 rows x 1 columns]
y = mydata.loc[:, 'x']
print(y)
## 1 1
## 2 2
## 3 4
## 4 3
## 5 4
## ..
## 99997 3
## 99998 3
## 99999 2
## 100000 4
## 100001 3
## Name: x, Length: 100001, dtype: int64
print(np.mean(y==1))
## 0.17022829771702283
print(np.mean(y==2))
## 0.24956750432495675
print(np.mean(y==3))
## 0.4117158828411716
print(np.mean(y==4))
## 0.16848831511684884