cs419 Data Mining
The course covers basic data mining techniques including those used for collective intelligence applications. We will experiment with these methods using Python.
- Collaborative filtering
- Item-based filtering
- Nearest-neighbor algorithms
- Distance methods (Manhattan, Euclidean, Minkowski)
- Evaluation of machine learning techniques
- Naïve Bayes classification methods.
- Probability density functions
- Naïve Bayes and unstructured text
The course takes an active, hands-on approach to learning. Students will spend much of the class time exploring, experimenting, and evaluating code using their own laptops. Class time is divided between short lectures, individual experimentation with programming, working on code with a partner, team projects, and quizzes. During the first week all students will be assigned to permanent teams of around 5 people.
Every week you will read one chapter of the textbook. I will not regurgitate that material in an lecture. I will assume you have read and understood it. There will be ‘readiness assessment tests’ on textbook material not covered in class. Class time will be spent practicing what we have learned. The emphasis is not on theory but in learning development skills that can be used in the workplace.
Experimental Nature of Course
This is the first time this course (cs419) has been offered. As such, I may need to make adjustments to the course as we progress through the semester. I am striving to make the course challenging but also have a relatively easy grading scale.
I am assuming that nearly everyone has a laptop. We will be working with laptops during a large percentage of our class. It doesn’t matter if your laptop runs Microsoft Windows, is a Mac, or an Ubuntu machine. Usually when I write this in a syllabus I add “It doesn’t matter if it is 5 years old. It also doesn’t matter how powerful it is–even a basic netbook will work.” In the case of this class, it does matter. Doing data mining even on small datasets is processor intensive. If you are using a netbook you will need to reduce the size of the data.
A Programmer’s Guide to Data Mining: The Ancient Art of the Numerati, which is a free online book. You can get extra credit by pointing out errors in this book.
Grading is based on a method developed by Professor Lee Sheldon at Indiana University. It is based on obtaining experience points (XP). The number of XP determines what level you are at. You start the class at Level Zero and with 0 XP. The level you obtain at the end of the semester determines your final grade. Here is the chart:
If you are at Level Two or lower when mid-semester reports are due, I will report your work as unsatisfactory.
All activities are optional. The total number of XP equals 2650. You only need 2100 for an A.
Readiness Assessment Tests – 400XP
There will be approximately 7 short multiple-choice readiness assessment tests (RATS) given during the course. Each quiz will be taken individually, then, immediately after, the same test will be taken as a team. Each individual quiz is worth on average 30 points; each team quiz is also worth on average 30XP. You will have advance notice of these quizzes. In addition, there may be unannounced mini-quizzes. They may be individual only, or individual and team.
Programming Practice – 400XP
Programming practice will typically be slight remixes of the book exercises. There will be approximately 7 practices each worth up to 60 points. Programming practice can be done individually or with a partner. You can only do two practices with the same partner.
Programming Demo – 150-300XP
You can elect to extend the code for a particular programming practice. During class I will offer possible extensions. You will give a short , 15 minute, demo of the code to the class. You will gain up to 150XP for each demo project. This can be done individually or with a partner. Can be repeated with a different partner.
Class Presentation – 150-300XP
You can elect to give a short, 15 minute presentation on some machine learning topic (I will offer suggestions throughout the semester). You can do this individually or with a partner. Can be repeated.
Exam – 300XP
Throughout the last half of the semester I will post questions & problems. These are part of the final exam for the course. You can elect to complete the work anytime between the time the problem is posted and the final exam period. You will gain 10% more XP if you complete the work within a week of the problem being posted.
Team In Class Projects and Worksheets – 400XP
Team projects may be programming tasks, design tasks, or other work.
Team Participation – 100XP
Each student will rate the helpfulness of all members of their team. Individual team participation scores will be the sum of the points they receive from other members of their team. Each team member distributes 100 points to other members of the team. The average team participation score will be 100 points. The rater must differentiate some of their ratings (they cannot assign the same rating to all members).
Final Project – 400XP
I like final projects but it seems like gaining 400XP during the last week defeats the idea of gaining levels by gradually increasing XP. To fix this, you will be gaining Final Project XP starting at the middle of the semester. At the middle of the semester each of you will come up with a 1-2 page written project proposal and present that proposal to the people in the class (50XP). The class will self-organize into teams to work on one of the proposals (each team works on a different proposal). If you proposal is one of those chosen you will get 25XP. Teams will use the SCRUM development process and a versioning system of their choice. There will be 3 iterative versions of the project that will be demo’d in class. Each version is worth up to 100XP to each member of the team.
Book Corrections/Suggestions 2-75XP per suggestion
Accommodations for Students with Special Needs
Any student with a documented disability may receive a special accommodation to complete any requirements of this course. If you are have a disability or believe you have one you may wish to self-identify. You may do so by providing documentation to the Office of Disability Services located in Room 203 of George Washington Hall (Phone: Voice 540-654-1266, Fax: 540-654-1163). Appropriate accommodations may then be provided for you. If you have a condition that may affect your ability to exit the premises in an emergency or that may cause an emergency during class, you are encouraged to discuss this in confidence with me and/or anyone at the Office of Disability Services. This office can also answer any questions you have about the Americans with Disabilities Act (ADA).
I assume you are an ethical student and a person with integrity. I expect that you will follow the university honor code (see http://rosemary.umw.edu/CSHonorCode.html). Please use common sense and ask yourself what would a person with integrity do? To help you, I would like to make three comments related to this:
Plagiarism means presenting some other person’s work as your own. This can mean using some other person’s words without acknowledging their source, or using some other person’s ideas. Copying another student’s work (homework or exam) is also plagiarism. Plagiarism will minimally result in an automatic zero for that submission.
Collusion is unauthorized collaboration that produces work which is then presented as work completed independently by the student. Collusion includes participating in group discussions that develop solutions which everyone copies. Penalties for plagiarism and collusion include receiving a failing grade for that work.
I ask that you respect the other people in the class. I recognize that your life circumstances may require you to receive cell phone calls during class. If this is the case please set your cell phone on vibrate and discretely leave the class to accept calls. During tests, if you walk out of the classroom, or consult/display your cell phone, I will assume you are done with the test and collect your grading sheet
During the first week of class I will ask you for your avatar name. This is the name that will appear on the Experience Point Google Spreadsheet that will be viewable by everyone in the class. If you wish to remain anonymous, don’t share your avatar name with anyone. To further protect the anonymity of those who wish to remain anonymous, the spreadsheet may also be populated by fictitious avatar names.