Implementing a Hidden Markov Model Toolkit

Summary Implement a toolkit for Hidden Markov Models (with discrete outputs), including (1) random sequence generation, (2) computing the marginal probability of a sequence with the forward and backward algorithms, (3) computing the best state sequence for an observation with the Viterbi algorithm, and (4) supervised and unsupervised maximum likelihood estimation of the model parameters from observations, using the Baum Welch algorithm for unsupervised learning.
Topics natural language processing, computational biology, unsupervised learning, dynamic programming, machine learning, hmms, hidden markov models, graphical models, expectation maximization, time series, sequences, baum-welch, maximum likelihood estimation.
Audience Students in an introductory course on natural language processing (NLP), but could also be used in computational biology or machine learning courses.
Difficulty Suitable for advanced undergraduates or beginning graduate students who are comfortable with programming. Can be completed within two weeks, or may be split up into a series of smaller assignments.
  • Extensive practice with two fundamental concepts: inference with dynamic programming and maximum likelihood estimation.
  • Exposure to empirical machine learning.
  • Provided applications are diverse, spanning "real-world" problems like part-of-speech tagging, linguistically interesting applications like consonant/vowel discovery in an unfamiliar script, and light-hearted ones like deciphering a secret message.
  • Students find that building an HMM from scratch is empowering, especially getting expectation maximization to work.
  • Some students may implement the algorithms by following directions from a textbook or notes, even without fully understanding the intuition. Therefore, a quiz or problem set to test theoretical understanding may be required as well.
  • There are open-source HMM implementations available online that pose a plagiarism risk. However, most of the algorithms are complex enough that blind copying won't usually work.
Dependencies Theory of HMMs and associated algorithms (example reading: Jurafsky & Martin's Speech and Language Processing (3rd ed), Ch. 8), competency in the assigned programming language. The starter code and assignment description uses Python. However, the assignment can easily be modified to use any other language.
  • Incorporate other sequence datasets from NLP and computational biology.
  • To make harder: Add MAP estimation or regularization for learning, generalize to HMMs over continuous observations such as speech, implement inference and learning algorithms in a distributed computing framework.
  • To make easier: Leave out or provide some of the methods.


  1. Assignment Description
  2. Sample data, model, and verification files
  3. Starter Code in Python
  4. Lecture/Reference Slides
  5. Jurfsky and Martin Reading