In this worksheet, we will consider the output of a Markov chain that is stored on a textfile. We will import this data, and do basics statistics to estimate the stationary distribution of corresponding Markov chain.
Suppose you are given the output of a \(100000\) steps of a irreducible and aperiodic finite state Markov chain. Carefully explain how you could estimate the stationary distribution for this Markov chain, and why you estimator is reasonable.
Import the data from the file markovchain.txt and use this data and your method above to estimate the stationary distribution.
Any ideas for the transition matrix?
\[\frac{1}{n} \sum_{i=1}^n \mathbf{1}[X_i = s] \to \pi(s).\] Thus for each state, we simply need to count the number of occurrences and divide by \(100000\), to get its approximate probability under \(\pi\).
z =read.table("markovchain.txt", sep=",")
z= z[,]
b1 = seq(-1,5, by=1)
hist(z, prob=TRUE, breaks=b1)sum(z==1)/100000## [1] 0.17023sum(z==2)/100000## [1] 0.24957sum(z==3)/100000## [1] 0.41172sum(z==4)/100000## [1] 0.16849Next we show how this code might be done in Python; you may need to install the package pandas; if you are running this inside R, you may need to put py_install(“pandas”) where you load the library(reticulate).
import pandas as pd
import numpy as np
mydata = pd.read_csv("https://tsoo-math.github.io/ucl3/markovchain.txt", sep =",")
print(mydata)##         x
## 1       1
## 2       2
## 3       4
## 4       3
## 5       4
## ...    ..
## 99997   3
## 99998   3
## 99999   2
## 100000  4
## 100001  3
## 
## [100001 rows x 1 columns]y = mydata.loc[:, 'x']
print(y)## 1         1
## 2         2
## 3         4
## 4         3
## 5         4
##          ..
## 99997     3
## 99998     3
## 99999     2
## 100000    4
## 100001    3
## Name: x, Length: 100001, dtype: int64print(np.mean(y==1))## 0.17022829771702283print(np.mean(y==2))## 0.24956750432495675print(np.mean(y==3))## 0.4117158828411716print(np.mean(y==4))
## 0.16848831511684884