# Kalman filters (and how they relate to HMMs)

Kalman filters are insanely popular in many engineering fields, especially those involve sensors and motion tracking. Consider how to design a radar system to track military aircrafts (or warships, submarines, … for that matter), how to track people or vehicles in a video stream, how to predict location of a vehicle carrying a GPS sensor… In all these cases, some (advanced) variation of Kalman filter is probably what you would need.

Learning and teaching Kalman filters is therefore quite challenging, not only because of the mere complexity of the algorithms, but also because there are many variations of them.

With a Computer Science background, I encountered Kalman filters several years ago, but never managed to put them into the global context of the field. I had chances to look at them again recently, and rediscovered yet another way to present and explain Kalman filters. It made a lot of sense to me, and hopefully it does to you too.

Note that there are a lot of details missing from this post (if you are building a radar system to track military aircrafts, look somewhere else!). I was just unhappy to see many introductory material on Kalman filters are either too engineering or too simplified. I want something more Machine Learning-friendly, so this is my attempt.

Let’s say you want to track an enemy’s aircraft. All you have is a lame radar (bought from Russia probably) which, when oriented properly, will give you a 3-tuple of range, angle and angular velocity $[r \;\phi\;\dot{\phi}]^{T}$ of the aircraft. This vector is called the observation $\mathbf{z}_k$ (subscript $k$ because it depends on time). The actual position of the aircraft, though, is a vector in cartesian coordinates $\mathbf{x}_k = [x_1\;x_2\;x_3]^{T}$. Since it is an enemy’s aircraft, you can only observe $\mathbf{z}_k$, and you want to track the state vector $\mathbf{x}_k$ over time, every time you receive a measurement $\mathbf{z}_k$ from the radar.

Visualised as a Bayesian network, it looks like this:

With all the Markov properties hold, i.e. $\mathbf{x}_k$ only depends on $\mathbf{x}_{k-1}$ and $\mathbf{z}_k$ only depends on $\mathbf{x}_k$, does this look familiar?

# Sweet implementation of Viterbi in Python

An implementation of the Viterbi algorithm for HMM in Python can be as short as 10 lines of code like this:

def Decode(self, obs):
trellis = np.zeros((self.N, len(obs)))
backpt = np.ones((self.N, len(obs)), 'int32') * -1
trellis[:, 0] = np.squeeze(self.initialProb * self.Obs(obs[0]))
for t in xrange(1, len(obs)):
trellis[:, t] = (trellis[:, t-1, None].dot(self.Obs(obs[t]).T) * self.transProb).max(0)
backpt[:, t] = (np.tile(trellis[:, t-1, None], [1, self.N]) * self.transProb).argmax(0)
tokens = [trellis[:, -1].argmax()]
for i in xrange(len(obs)-1, 0, -1):
tokens.append(backpt[tokens[-1], i])


This is the implementation of Viterbi I’ve completed recently. Holy python, how sweet it is! The code is even shorter than the pseudo-code, which is taken from this book.

It takes no more than 30 lines for the complete class:

import numpy as np

class Decoder(object):
def __init__(self, initialProb, transProb, obsProb):
self.N = initialProb.shape[0]
self.initialProb = initialProb
self.transProb = transProb
self.obsProb = obsProb
assert self.initialProb.shape == (self.N, 1)
assert self.transProb.shape == (self.N, self.N)
assert self.obsProb.shape[0] == self.N

def Obs(self, obs):
return self.obsProb[:, obs, None]

def Decode(self, obs):
trellis = np.zeros((self.N, len(obs)))
backpt = np.ones((self.N, len(obs)), 'int32') * -1

# initialization
trellis[:, 0] = np.squeeze(self.initialProb * self.Obs(obs[0]))

for t in xrange(1, len(obs)):
trellis[:, t] = (trellis[:, t-1, None].dot(self.Obs(obs[t]).T) * self.transProb).max(0)
backpt[:, t] = (np.tile(trellis[:, t-1, None], [1, self.N]) * self.transProb).argmax(0)
# termination
tokens = [trellis[:, -1].argmax()]
for i in xrange(len(obs)-1, 0, -1):
tokens.append(backpt[tokens[-1], i])


The full source code and a simple test case, as always, can be found on github.

# Markov models

Fortunately, I have another opportunity to complete my presentation on Markov models which I have partly published before. This time I have made some significant changes:

– Added a section discussing the motivation of Markov chains.

– Included three algorithms of HMM with detailed ideas and equations.

– Finally, to complete a talk on Markov models, I have also included a short review on Markov Random Field and its application in image segmentation. However I don’t have enough time so this section is pretty short and lack of some details. This is a fundamental subject so you can read about it in various textbooks.

Well, honestly this is the longest presentation I have ever made (with 107 pages). Actually I will have to present it this afternoon. I’m not sure how much time it will take to present this slide completely, but you can be sure that it takes me a lot of time (and many stay-up-late nights) to be completed. So if you are gonna adopt it for your own purpose, please take it with care 😉

Due to this slide, the tutorial on QR algorithm is temporarily corrupted. I will post the final section on QR algorithm soon (hopefully before the weekend).

# Hidden Markov Models

This hasn’t been finished yet, and I am still working on this. Maybe I will upload a completed version on tomorrow (Dec. 6th).

Updated March 30, 2011: You can view and download the completed version here.

Updated December 8, 2010: I have updated the latest version, though it is still uncompleted. We will need about 10-20 slides for the three famous HMM tasks. But I’m too tired to finish this right now …