In this worksheet, we will consider the output of a Markov chain that is stored on a textfile. We will import this data, and do basics statistics to estimate the stationary distribution of corresponding Markov chain.

Solutions

\[\frac{1}{n} \sum_{i=1}^n \mathbf{1}[X_i = s] \to \pi(s).\] Thus for each state, we simply need to count the number of occurrences and divide by \(100000\), to get its approximate probability under \(\pi\).

z =read.table("markovchain.txt", sep=",")
z= z[,]

b1 = seq(-1,5, by=1)
hist(z, prob=TRUE, breaks=b1)

sum(z==1)/100000
## [1] 0.17023
sum(z==2)/100000
## [1] 0.24957
sum(z==3)/100000
## [1] 0.41172
sum(z==4)/100000
## [1] 0.16849

Next we show how this code might be done in Python; you may need to install the package pandas; if you are running this inside R, you may need to put py_install(“pandas”) where you load the library(reticulate).

import pandas as pd
import numpy as np

mydata = pd.read_csv("https://tsoo-math.github.io/ucl3/markovchain.txt", sep =",")

print(mydata)
##         x
## 1       1
## 2       2
## 3       4
## 4       3
## 5       4
## ...    ..
## 99997   3
## 99998   3
## 99999   2
## 100000  4
## 100001  3
## 
## [100001 rows x 1 columns]
y = mydata.loc[:, 'x']
print(y)
## 1         1
## 2         2
## 3         4
## 4         3
## 5         4
##          ..
## 99997     3
## 99998     3
## 99999     2
## 100000    4
## 100001    3
## Name: x, Length: 100001, dtype: int64
print(np.mean(y==1))
## 0.17022829771702283
print(np.mean(y==2))
## 0.24956750432495675
print(np.mean(y==3))
## 0.4117158828411716
print(np.mean(y==4))
## 0.16848831511684884

Endnotes