Syllabus
University of Mary Washington
Department of Computer Science
CPSC470u: Introduction to Data Mining
Fall 2010
TTh 11-12:15 Trinkle B6
Description
The course covers basic data mining techniques including those used for collective intelligence applications. We will experiment with these methods using Python and an open source data mining tool called Weka.During the summer I created a draft of an online data mining textbook and we will be using it for the class.
Format
The course takes an active, hands-on approach to learning. Students will spend much of the class time exploring, experimenting, and evaluating code using their own laptops. Class time is divided between short lectures, individual experimentation with programming, working on code with a partner, team projects, and quizzes. During the first week all students will be assigned to permanent teams. Depending on the final class size there will be between 5-6 people per team. Teams are constructed so they contain people with a variety of skills including those with l33t skillz.
Laptops
I am assuming that nearly everyone has a laptop. We will be working with laptops during a large percentage of our class. It doesn’t matter if your laptop runs Microsoft Windows, is a Mac, or an Ubuntu machine. Usually when I write this in a syllabus I add “It doesn’t matter if it is 5 years old. It also doesn’t matter how powerful it is–even a basic netbook will work.” In the case of this class, it does matter. Doing data mining even on small datasets is processor intensive. If you are using a netbook you will need to reduce the size of the data.
Textbooks
There are two required textbooks for the class.
- A Programmer’s Guide to Data Mining: The Ancient Art of the Numerati. This is a draft of a book I wrote. It is available (and will always be available) free online. It goes through a number of data mining examples using Python. You can get extra credit by pointing out errors.
- Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems) by Ian H. Witten and Eibe Frank. 2005.
Grading
If you have had me for another class you are familiar with my grading scheme where the class decides on how the final grade is determined. This semester I am trying out a new scheme. Grading is based on a method developed by Professor Lee Sheldon at Indiana University. It is based on obtaining experience points (XP). The number of XP determines what level you are at. You start the class at Level 1 and with 0 XP. The level you obtain at the end of the semester determines your final grade. Here is the chart:
I differ from Professor Sheldon in offering students options on how to accumulate XP. There will be opportunities to earn at least 2200XP during the course. This gives each individual some flexibility in what tasks to do. You gain XP working individually, with a partner, and with your team.
Activities
Quizzes – 400XP
There will be approximately 6 short multiple-choice quizzes given during the course. Each quiz will be taken individually, then, immediately after, the same test will be taken as a team. Each individual quiz is worth on average 30 points; each team quiz is also worth on average 30XP. You will have advance notice of these quizzes. In addition, there will be unannounced mini-quizzes. They may be individual only, or individual and team.
Programming Practice – 400XP
Programming practice is basically doing the book exercises. I might ask you to do a slight variation of what is in the book. There will be approximately 6 practices each worth up to 75 points. Programming practice can be done individually or with a partner. You can only do two practices with the same partner.
Program Demo – 150XP
You can elect to extend the code for a particular programming practice. During class I will offer possibile extensions. You will give a short demo of the code to the class. You will gain up to 150XP. This can be done individually or with a partner. Can be repeated with a different partner.
Exam – 250XP
Throughout the semester I will post questions & problems. These are part of the final exam for the course. You can elect to complete the work anytime between the time the problem is posted and Thursday Dec. 9th at 2:30pm. You will gain 10% more XP if you complete the work within a week of the problem being posted.
Team Reading Presentation – 75XP each team member
Teams will do a presentation on a section of one of the textbooks.
Team In-class Projects – at least 400XP
Team projects may be programming tasks, design tasks, or other work.
Team Participation – about 120XP
Each student will rate the helpfulness of all members of their team. Individual team participation scores will be the sum of the points they receive from other members of their team. Each team member distributes 100 points to other members of the team. The average team participation score will be 100 points. The rater must differentiate some of their ratings (they cannot assign the same rating to all members).
Book Corrections/Suggestions – 5-75XP per correction
The book, A Programmer’s Guide to Data Mining is just a draft. I will give you points if you comment on the book website pointing out typos, unclear sections (more points if you give a suggestion on how to make it more clear), and other suggestions.
Final Project – 400XP (extra XP possible)
I like final projects but it seems like gaining 500XP during the last week defeats the idea of gaining levels by gradually increasing XP. To fix this, you will be gaining Final Project XP starting at the middle of the semester. At the middle of the semester each of you will come up with a 1-2 page written project proposal (50XP) and present that proposal to the people in the class (50XP). The class will self-organize into teams to work on one of the proposals (each team works on a different proposal). If you proposal is one of those chosen you will get 25XP. Teams will use the SCRUM development process and a versioning system of their choice. There will be 3 iterative versions of the project that will be demo’d in class. Each version is worth up to 100XP to each member of the team.
Avatar names, pseudonyms, noms de plume
During the first week of class I will ask you for your avatar name, pseudonym, whatever. This is the name that will appear on the Experience Point Google Spreadsheet that will be viewable by everyone in the class. If you wish to remain anonymous, don’t share your avatar name with anyone. On the other hand, if you would like recognition for achieving level 10 as an example (“a big shout out to tera miner for achieving level 10”), you can share your name. The decision is yours. To further protect the anonymity of those who wish to remain anonymous, the spreadsheet will also be populated by fictitious avatar names.
Accommodations for students with special needs
Any student with a documented disability may receive a special accommodation to complete any requirements of this course. If you are have a disability or believe you have one you may wish to self-identify. You may do so by providing documentation to the Office of Disability Services located in Room 203 of George Washington Hall (Phone: Voice 540-654-1266, Fax: 540-654-1163). Appropriate accommodations may then be provided for you. If you have a condition that may affect your ability to exit the premises in an emergency or that may cause an emergency during class, you are encouraged to discuss this in confidence with me and/or anyone at the Office of Disability Services. This office can also answer any questions you have about the Americans with Disabilities Act (ADA).