IEOR 165 Discussion 7: Markov Models

In this discussion we will consider two different examples of markov models. First we will analyze the navigation of a website by estimating transition probabilities. Then we will use notions of filtering to come up with estimates of the movements of an object in one dimension.

Before we begin. Recall that markov models are models in which we assume the data has some notion of time. For instance when considering $X_1,X_2,...,X_n$ in this context we can think of $n$ different time intervals and $X_1$ being a measurement taken at time 1 $X_2$ as a measurment taken at time 2 and so on. The markov setting is more general than the i.i.d. setting since instead of requiring full independence we only require the following property: $$\mathbb{P}(X_i < x\, |\, X_1,X_2,...,X_{i-1}) = \mathbb{P}(X_i< x\,|\, X_{i-1}) $$ Where as if the data were i.i.d. we would have that: $$\mathbb{P}(X_i < x\, |\, X_1,X_2,...,X_{i-1}) = \mathbb{P}(X_i< x) $$ These kinds of models are especially useful when considering dynamical systems as well as considering time dependant effects like seasonality.

Example 1: Transition probability estimation for website navigation

Suppose we are manageing a website which has 3 pages comprising it. These are a main header page, an about page, a contact page and a service. Consider the following sample paths we have of useres who visited our site:

In [3]:
import numpy as np
import pandas as pd

website_data = pd.read_csv('website_data.csv')
website_data[:20]
Out[3]:
User_ID Page_1 Page_2 Page_3 Page_4 Page_5 Page_6 Page_7 Page_8 Page_9 Page_10 Page_11
0 1 2 3 1 2 3 2 1 1 2 3 2
1 2 0 1 2 3 1 2 0 2 0 3 1
2 3 2 0 3 1 2 0 1 2 3 2 3
3 4 2 3 1 0 2 3 2 3 2 1 2
4 5 3 1 2 1 2 1 0 1 0 3 1
5 6 0 1 2 1 2 3 2 3 2 3 1
6 7 2 3 2 3 2 0 1 0 1 2 1
7 8 0 1 2 1 2 1 2 0 1 2 3
8 9 3 1 2 3 2 3 2 1 2 3 1
9 10 2 1 2 1 2 3 1 2 3 2 3
10 11 1 2 3 2 3 1 0 2 1 1 2
11 12 3 1 2 1 2 3 2 3 2 3 0
12 13 0 2 3 2 0 3 2 3 1 2 3
13 14 0 1 2 1 2 3 1 2 3 2 1
14 15 1 2 3 2 1 2 1 0 1 2 3
15 16 1 1 2 1 2 1 1 2 3 2 3
16 17 2 3 2 3 1 2 3 0 1 3 2
17 18 3 1 0 2 0 2 3 1 2 3 2
18 19 1 2 3 2 0 2 1 1 3 1 2
19 20 2 1 0 3 1 2 1 0 2 1 2

Note that for each user we have a sequence of 10 pages they visited while on our website prior to leaving. Here the states correspond as follows:

  • 0: Header
  • 1: About
  • 2: Contact
  • 3: Service

We are interested in estimating the browsing behavior of our user which is the same as calculating the probability a user will visit a page given their current viewed page on the site. A reasonable for this estimation is clearly a Markov model with the model states corresponding to each of the different pages. Let's analyze this data by first forming a state transition matrix where each cell couts the number of times we transition from one state to another:

In [31]:
web_arr = pd.DataFrame.as_matrix(website_data.iloc[:,1:])
(ro, co) = web_arr.shape
trans_matrix = np.zeros((4,4))

for ind_i in range(ro):
    curr_use = web_arr[ind_i,:]
    for ind_j in range(co-1):
        cur_stat = curr_use[ind_j]
        fute_stat = curr_use[ind_j+1]
        trans_matrix[cur_stat,fute_stat] += 1

label_matrix = np.concatenate(([['Header'],['About'],['Contact'],['Service']],trans_matrix),1)
label_matrix = np.concatenate(([[' ','Header','About','Contact','Service']],label_matrix),0)
trans_df = pd.DataFrame(data=label_matrix[1:,1:],index=label_matrix[1:,0],columns=label_matrix[0,1:])
trans_df
Out[31]:
Header About Contact Service
Header 0.0 27.0 24.0 13.0
About 27.0 25.0 88.0 4.0
Contact 18.0 52.0 0.0 104.0
Service 8.0 47.0 63.0 0.0

Using these measurments how can we estimate the transition probabilities?

Recall the MLE estimate of the transition probabilities is given by: $$ \hat{p}_{ij} = \frac{N_{ij}}{\sum_k N_{ik}}$$

where $N_{ij}$ is the total count of times we transitioned from state $i$ to state $j$. Using this method we can construct the following probability transition matrix for our website:

In [62]:
sum_mat = np.dot(trans_matrix,np.ones((4,1)))
norm_mat = 1/sum_mat
#norm_mat

trans_prob_mat = norm_mat*trans_matrix
trans_prob_mat = np.around(trans_prob_mat,3)
label_matrix = np.concatenate(([['Header'],['About'],['Contact'],['Service']],trans_prob_mat),1)
label_matrix = np.concatenate(([[' ','Header','About','Contact','Service']],label_matrix),0)
trans_df = pd.DataFrame(data=label_matrix[1:,1:],index=label_matrix[1:,0],columns=label_matrix[0,1:])
trans_df
Out[62]:
Header About Contact Service
Header 0.0 0.422 0.375 0.203
About 0.188 0.174 0.611 0.028
Contact 0.103 0.299 0.0 0.598
Service 0.068 0.398 0.534 0.0

Example 2: Using Holt-Winters to track the movement of an Object

suppose we are tracking some object in 2d space, we periodically take measurments and obtain the following data set:

In [5]:
move_data = pd.read_csv('obj_2d_data.csv')
move_data[:20]
Out[5]:
Time X_Coord Y_Coord
0 0.000 1.003 0.005
1 0.010 0.975 0.000
2 0.020 0.975 0.021
3 0.030 0.981 0.027
4 0.040 0.972 0.045
5 0.051 0.961 0.052
6 0.061 0.933 0.046
7 0.071 0.931 0.079
8 0.081 0.916 0.076
9 0.091 0.906 0.080
10 0.101 0.894 0.099
11 0.111 0.889 0.111
12 0.121 0.885 0.123
13 0.131 0.887 0.135
14 0.141 0.876 0.135
15 0.152 0.853 0.149
16 0.162 0.844 0.168
17 0.172 0.806 0.159
18 0.182 0.803 0.187
19 0.192 0.822 0.189

In this case we are interested in tracking the current position of the object using the history of the previous positions. If we know that this object is experiencing a constant acceleration we can express the relationship between these points as: $$x_{t+1} = x_{t-1} + \frac{v_0 a}{2}(2t-1) + \epsilon_t$$

Where $\epsilon_t$ is i.i.d. normal noise with known variance. In this case we would be interested in estimating the product $\frac{v_0 a}{2}$ which we can simplify into a single parameter $\beta$ and represent it as follows $$x_{t+1} = x_{t-1} + 2\beta t - \beta + \epsilon_t$$.

Observe that if we knew $\beta$ then by knowing the initial condition (which we will call $x_0$ or the position at time 0) we can derive the location of the object at any point in the future. So to form our estimate we want to solve an optimization of the form: $$\min_{\beta,\hat{x}_0} \{ \sum_{t=0}^T\|x_{t} - \hat{x}_{t}\|^2 : \hat{x}_{t+1} = \hat{x}_{t-1} + 2\beta t - \beta \; \forall 1\leq t \leq T-1 \} $$

One useful techinque we can employ in this situation involves using a Kalman Filter such as the one implemented in the Holt-Winters Algorithm. These class of models are also considered Markov models since again $x_{t+1}$ only depends on the current time and position and not on any of the positions in the past. In this case however we have rather simple dynamics so we can solve this problem by directly solving the recursion as follows:

$$ \hat{x}_0 = \hat{x}_0$$$$ \hat{x}_1 = \hat{x}_0 + 2\beta - \beta = \hat{x}_0 + \beta$$$$\hat{x}_2 = \hat{x}_1 + 4\beta - \beta = \hat{x}_0 + 4\beta$$$$...$$$$\hat{x}_t = \hat{x}_0 + t^2\beta$$

Now after solving the recursion we see we can solve the following equivalent least squares problem to obtain our estimates:

$$\min_{\beta,\hat{x}_0} \{ \sum_{t=0}^T\|x_t - \hat{x}_0 - t^2 \beta\|^2\} $$

So adding a $t^2$ collumn to our data we can use an OLS solver to compute our estimates.