Summary |
Implement a toolkit for Hidden Markov Models (with discrete outputs),
including (1) random sequence generation,
(2) computing the marginal probability of a sequence with the forward and backward algorithms,
(3) computing the best state sequence for an observation with the Viterbi algorithm,
and (4) supervised and unsupervised maximum likelihood estimation
of the model parameters from observations,
using the Baum Welch algorithm for unsupervised learning.
|
Topics |
natural language processing, computational biology,
unsupervised learning, dynamic programming, machine learning,
hmms, hidden markov models, graphical models, expectation maximization,
time series, sequences, baum-welch,
maximum likelihood estimation.
|
Audience |
Students in an introductory course on natural language processing (NLP),
but could also be used in computational biology or machine learning courses.
|
Difficulty |
Suitable for advanced undergraduates or beginning graduate students
who are comfortable with programming.
Can be completed within two weeks, or
may be split up into a series of smaller assignments.
|
Strengths
|
- Extensive practice with two fundamental concepts: inference with dynamic programming
and maximum likelihood estimation.
- Exposure to empirical machine learning.
- Provided applications are diverse, spanning "real-world" problems like part-of-speech tagging,
linguistically interesting applications like consonant/vowel discovery in an unfamiliar script,
and light-hearted ones like deciphering a secret message.
- Students find that building an HMM from scratch is empowering,
especially getting expectation maximization to work.
|
Weaknesses |
- Some students may implement the algorithms by following directions
from a textbook or notes, even without fully understanding the intuition.
Therefore, a quiz or problem set to test theoretical understanding
may be required as well.
- There are open-source HMM implementations available online
that pose a plagiarism risk.
However, most of the algorithms are complex enough
that blind copying won't usually work.
|
Dependencies |
Theory of HMMs and associated algorithms (example reading: Jurafsky & Martin's Speech and Language Processing (3rd ed), Ch. 8),
competency in the assigned programming language.
The starter code and assignment description uses Python.
However, the assignment can easily be modified to use any other language.
|
Variants |
- Incorporate other sequence datasets from NLP and computational biology.
- To make harder: Add MAP estimation or regularization for learning,
generalize
to HMMs over continuous observations such as speech,
implement inference and learning algorithms in a distributed computing framework.
- To make easier: Leave out or provide some of the methods.
|