Instructor Resources
In addition, to all the materials available to students, there are several additional items that are available for instructors upon request.
Access to Instructor Materials
Please contact either:
Todd Neller, Gettysburg College, tneller@gettysburg.edu
Laura E. Brown, Michigan Technological University, lebrown@mtu.edu
# Student Materials
Exercises.docx
Exercises.pdf
*.dat # data files for Exercises
gap.pdf # reference paper for Gap statistic question
data2d.m # Octave script for visualizing input data
cluster2d.m # Octave script for visualizing output cluster data
doc/ # Javadoc documentation for sample solutions
iris/ # folder containing Weka tutorial, data, and Exercises
ppt/ # Powerpoint and PDF presentations for k-means
java/ # Java version of programming exercises with starter code
# Instructor Materials
Exercises - Instructor's Guide.docx
Exercises - Instructor's Guide.pdf # Sample answers and discussion
KMeans.java # Sample Java solution for first exercise
KMeansIterated.java # Sample Java solution for second exercise
KMeansIteratedGap.java # Sample Java solution for third exercise
ex1.sh # script to execute first exercise sample solution code
ex2.sh # script to execute second exercise sample solution code
ex3.sh # script to execute third exercise sample solution code
ex1/ # folder of output clustering data from ex1.sh + summary.txt
ex2/ # folder of output clustering data from ex2.sh + summary.txt
ex3/ # folder of output clustering data from ex3.sh + summary.txt
instructor-soln/ # text files with info on original data generation
Clustering Iris Data with Weka (Instructor Copy).docx
Clustering Iris Data with Weka (Instructor Copy).pdf
# solutions for Weka tutorial
ppt/ # Powerpoint and PDF presentations for k-means
java/ # sample solution and JUnit test code for Java assignment
Weka example
A tutorial, Clustering_Iris_Data_with_Weka.pdf, on how to do simple clustering using Weka is available.
This tutorial uses the well-known iris data set. The data is provided in the ARFF file format in the student materials and also available at http://tunedit.org/repo/UCI/iris.arff.
The tutorial and copy of the iris data are available in the student materials zip-file in the iris/
folder.
Example Clustering data sets
Powerpoint Slides
In the Powerpoint provided, k-Means Clustering.pptx and K-Means Clustering.pdf, describing the clustering method, several example data sets are shown illustrating where k-means fails. There are two main sources of these images:
- the textbook and related slides of Tan, Steinbach, and Kumar Introduction to Data Mining (The slides are available at: http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item5, specifically from Chapter 8)
- clustering data sets available at http://cs.joensuu.fi/sipu/datasets/ and a MATLAB script
cluster_examples.m
. (The Shape sets were of focus)
World Country Data
A subset of indicators was collected and merged for world countries. The data set is available without preprocessing ex-country-data.csv and in a standardized form ex-country-data-preproc.csv. A richer description of the data and its use is shown in the following example using R: clustering-example.
Website sources
The source code to build this website is also available for instructors to use an modify. The site sources are in Markdown and built using Mkdocs which requires Python and pip.