cs419 Data Mining
The course covers basic data mining techniques including those used for collective intelligence applications. We will experiment with these methods using Python.
- machine learning pipeline
- basic concepts in machine learning
- decision trees
- dimension reduction
- ensemble methods (XGBoost)
- deep learning
- Collaborative filtering
- Item-based filtering
- Nearest-neighbor algorithms
- Distance methods (Manhattan, Euclidean, Minkowski)
- Evaluation of machine learning techniques
- Naïve Bayes classification methods.
- Probability density functions
- Naïve Bayes and unstructured text
The course takes an active, hands-on approach to learning. Students will spend much of the class time exploring, experimenting, and evaluating code using their own laptops. Class time is divided between short lectures, individual experimentation with programming, working on code with a partner, team projects, and quizzes. During the first week all students will be assigned to permanent teams of around 5 people.
Every week you will read one chapter of the textbook. I will not regurgitate that material in an lecture. I will assume you have read and understood it. There will be ‘readiness assessment tests’ on textbook material not covered in class. Class time will be spent practicing what we have learned. The emphasis is not on theory but in learning development skills that can be used in the workplace.
I am assuming that nearly everyone has a laptop. We will be working with laptops during a large percentage of our class. It doesn’t matter if your laptop runs Microsoft Windows, is a Mac, or an Ubuntu machine. Usually when I write this in a syllabus I add “It doesn’t matter if it is 5 years old. It also doesn’t matter how powerful it is–even a basic netbook will work.” In the case of this class, it does matter. Doing data mining even on small datasets is processor intensive.
Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron.
Grading is based on a method developed by Professor Lee Sheldon at Indiana University. It is based on obtaining experience points (XP). The number of XP determines what level you are at. You start the class at Level Zero and with 0 XP. The level you obtain at the end of the semester determines your final grade. Here is the chart:
If you are at Level One or lower when mid-semester reports are due, I will report your work as unsatisfactory.
All activities are optional. The total number of XP exceeds 1775. You only need 1625 for an A.
Labs – 800XP
There will be 7-8 programming labs that can be done with a partner.
Quizes – 300XP
There will be around 10 very short (5-10 minutes) quizzes.
Team Pencil exercises – 200XP
To help you understand various machine learning methods, there will be a set of worksheets designed to be completed by hand. For example, there is a worksheet that asks you to compute entropy for a particular problem set.
Team Projects – 400XP
Throughout the semester I will be offering different data mining/machine learning challenges. (for example, this one on Caterpillar Tractor tubes. Solve the problem–gain XP. Simple as that!
Team Participation – 75XP
Each student will rate the helpfulness of all members of their team. Individual team participation scores will be the sum of the points they receive from other members of their team. Each team member distributes 100 points to other members of the team. This will then be adjusted to make the average team participation score 75 points. The rater must differentiate some of their ratings (they cannot assign the same rating to all members).
Announcements, discussions, and questions
I will communicate with the class via the UMW Deep Learning Institute Slack account. (Link to Invitation)
Accommodations for Students with Special Needs
Any student with a documented disability may receive a special accommodation to complete any requirements of this course. If you are have a disability or believe you have one you may wish to self-identify. You may do so by providing documentation to the Office of Disability Services located in Room 203 of George Washington Hall (Phone: Voice 540-654-1266, Fax: 540-654-1163). Appropriate accommodations may then be provided for you. If you have a condition that may affect your ability to exit the premises in an emergency or that may cause an emergency during class, you are encouraged to discuss this in confidence with me and/or anyone at the Office of Disability Services. This office can also answer any questions you have about the Americans with Disabilities Act (ADA).
I assume you are an ethical student and a person with integrity. I expect that you will follow the university honor code (see http://rosemary.umw.edu/CSHonorCode.html). Please use common sense and ask yourself what would a person with integrity do? To help you, I would like to make three comments related to this:
Plagiarism means presenting some other person’s work as your own. This can mean using some other person’s words without acknowledging their source, or using some other person’s ideas. Copying another student’s work (homework or exam) is also plagiarism. Plagiarism will minimally result in an automatic zero for that submission.
Collusion is unauthorized collaboration that produces work which is then presented as work completed independently by the student. Collusion includes participating in group discussions that develop solutions which everyone copies. Penalties for plagiarism and collusion include receiving a failing grade for that work.
I ask that you respect the other people in the class. I recognize that your life circumstances may require you to receive cell phone calls during class. If this is the case please set your cell phone on vibrate and discretely leave the class to accept calls. During tests, if you walk out of the classroom, or consult/display your cell phone, I will assume you are done with the test and collect your grading sheet
During the first week of class I will ask you for your avatar name. This is the name that will appear on the Experience Point Google Spreadsheet that will be viewable by everyone in the class. If you wish to remain anonymous, don’t share your avatar name with anyone. To further protect the anonymity of those who wish to remain anonymous, the spreadsheet may also be populated by fictitious avatar names.