Summary 
Implement a toolkit for Hidden Markov Models (with discrete outputs),
including (1) random sequence generation,
(2) computing the marginal probability of a sequence with the forward and backward algorithms,
(3) computing the best state sequence for an observation with the Viterbi algorithm,
and (4) supervised and unsupervised maximum likelihood estimation
of the model parameters from observations,
using the Baum Welch algorithm for unsupervised learning.

Topics 
natural language processing, computational biology,
unsupervised learning, dynamic programming, machine learning,
hmms, hidden markov models, graphical models, expectation maximization,
time series, sequences, baumwelch,
maximum likelihood estimation.

Audience 
Students in an introductory course on natural language processing (NLP),
but could also be used in computational biology or machine learning courses.

Difficulty 
Suitable for advanced undergraduates or beginning graduate students
who are comfortable with programming.
Can be completed within two weeks, or
may be split up into a series of smaller assignments.

Strengths

 Extensive practice with two fundamental concepts: inference with dynamic programming
and maximum likelihood estimation.
 Exposure to empirical machine learning.
 Provided applications are diverse, spanning "realworld" problems like partofspeech tagging,
linguistically interesting applications like consonant/vowel discovery in an unfamiliar script,
and lighthearted ones like deciphering a secret message.
 Students find that building an HMM from scratch is empowering,
especially getting expectation maximization to work.

Weaknesses 
 Some students may implement the algorithms by following directions
from a textbook or notes, even without fully understanding the intuition.
Therefore, a quiz or problem set to test theoretical understanding
may be required as well.
 There are opensource HMM implementations available online
that pose a plagiarism risk.
However, most of the algorithms are complex enough
that blind copying won't usually work.

Dependencies 
Theory of HMMs and associated algorithms (example reading: Jurafsky & Martin's Speech and Language Processing (3rd ed), Ch. 8),
competency in the assigned programming language.
The starter code and assignment description uses Python.
However, the assignment can easily be modified to use any other language.

Variants 
 Incorporate other sequence datasets from NLP and computational biology.
 To make harder: Add MAP estimation or regularization for learning,
generalize
to HMMs over continuous observations such as speech,
implement inference and learning algorithms in a distributed computing framework.
 To make easier: Leave out or provide some of the methods.
