Nate Derbinsky <http://derbinsky.info>
Department of Computer Science and Networking
Wentworth Institute of Technology
Summary This is a term-long project for students in a CS2-level course. Through a series of scaffolded steps, students develop a flexible framework for evaluating classification algorithms. In each phase of the project, students are provided documentation, unit tests, and supporting compiled code. The focus is on abstraction, such that the application can "mix and match" any classification algorithm with any training/testing-pair evaluation dataset. The purpose of the project is for the students to apply principles of object-oriented programming to a relatively large-scale, real-world problem.
Topics Machine learning, supervised learning, classification, knowledge representation
Audience CS2 or second-year level, depending on the program
Difficulty Difficulty is moderate-to-high: students must synthesize understanding of machine learning while also developing a relatively large application. Students are given about 6-weeks to complete the project: the first four weeks are structured with scaffolded assignments (each independent of the other), with the remaining two weeks to correct mistakes and integrate the work.
Strengths The project requires students to bring together a variety of object-oriented concepts and implement a significantly sized, and real-world-relevant, application. The machine-learning concepts integrate well with the OO concepts and prepare students for later system-building or AI/ML coursework. Students find the ability to accurately classify real-world datasets (e.g. MNIST, DNA) motivating.
Weaknesses This project has been delivered only once to a small cohort of 3 - no reliable evaluation/analysis has been performed. While the ideas/framework are not language-dependent, there would need considerable effort to port to another language (particularly project scaffolding and unit tests). Empirically, students that have not adequately adapted to OO-programming struggle with the level of difficulty of the project. Limited exposure to non-classification ML, as well as non-ML AI, topics.
Dependencies Knowledge:
  • Object-oriented programming (encapsulation, ADT)
  • Collections (lists, maps), iteration
  • File I/O
  • Unit testing
  • Ability to implement an algorithm given a textual description
Requirements:
  • Java
  • Eclipse recommended (or comparable IDE)
Variants In each part of the project, relevant opportunities for extra-credit/extensions are provided (including the project overview). An example visual classification is provided, which could serve as an opportunity for students to learn about more advanced GUI programming.

Assignment Components

Example solutions and suggested rubrics will be made available to instructors upon request.

Project Overview

Part 1

Part 2

Part 3

Part 4