Class Schedule
Week 1
Monday 13 January
- what is data mining
- course logistics
- a simple movie rating data set
Wednesday 15 January
- collaborative filtering
- nearest neighbor algorithm
Friday 17 January
- Pearson Correlation Coefficient
- athletes.xls
- team worksheet
- data.xls
Week 2
Monday 20 January
- no class- Martin Luther King Holiday
Wednesday 22 January
- snow day
Friday 24 January
- RAT 1: Chapters 1 & 2
- Team Task: demo recommender.py with the book database
- Partner Task: recommendation system for movie ratings
- possible presentations. (please gmail me (ron.zacharski) if you are interested.
Week 3
Monday 27 January
- Partner Task II: Movie Lens Task
- demo Movie Lens Recommendation System.
- HOMEWORK: Finish Movie Lens Task
Wednesday 29 January
- In the news (will show a sampling of these):
- Amazon ships before you order
- IBM Watson
- Low energy Bluetooth
- Imagine the Future of Aging
- The Birth of a Word
- Movie Lens cont’d
Friday 31 January
- work on the wine dataset
- Movie Lens revisited.
Week 4
Monday 3 February
- RAT 2: Chapter 3 (one 3×5 card of notes –can use both sides)
Wednesday 5 February
- Announcements:
- No class 28th February – the day before spring break (I will be in New Mexico).
- Hackathon(s)?
- Presentation reminder.
- Summary: Here is what we covered so far.
- Weighted Slope One Worksheet
- Chapter 3 Coding Practice
Friday 7 February
- Finish up tasks (worksheet & programming)
Week 5
Monday 10 February
- research talk: Arabic Classification
Wednesday 12 February
- Decision Trees
Friday 14 February
- Decision Tree Lab
Week 6
Monday 17 February
- RAT 3: Chapter 4 (one 3×5 card of notes)
Wednesday 19 February
- Decision Trees cont’d
- Optional HW: create a decision tree for the contact lens data
Friday 21 February
- Lab to finish ch 3 tasks.
Week 7
Monday 24 February
- Covering Rules
- Ch 4 lab.
Wednesday 26 February
Friday 28 February
Week 8
Spring Break
Week 9
Monday 10 March
- dimensionality
- finish ch 4 lab
- single attribute classifier
Wednesday 12 March
- RAT 4: ch 5
- Chapter 5 Tasks
Friday 14 March
Week 10
Monday 17 March
- snow day
Wednesday 19 March
- Data Set Challenge. Can you create a classifier and evaluate its performance for the Wisconsin Breast Cancer Data Set (using wdbc.data and names)? 75XP
- Linear Regression.
- online regression calculator (you try on your data 25xp)
- sample house data
- blog post about a simple regression method (implement for 100xp)
Friday 21 March
Week 11
Monday 24 March
- RAT 5: Chapter 6
- Bayes Team worksheet
Wednesday 26 March
- Presentation: Practical Applications of Bayes: Allie Cropp
- More on accessing the accuracy of classifiers (FN & FP)
- FP & FN partner task
- calculate the FP & FN for a few classifiers on the Pima data and/or the cancer data
- recall that the diabetes data is if the person gets diabetes in 3 years. The cancer data is if the tumor is malignant or benign
- for these 2 data sets which (if any) measure is more important?
- Bayes Work – Chapter 6 tasks
Friday 28 March
- Lift Charts
- Bayes Work cont’d
Week 12
Monday 31 March
- Information Retrieval
- Tokenization and Normalization
- term document incidence matrices
- inverted index (postings index)
Wednesday 2 April
- Information Retrieval review
- term frequency
- document frequency
- collection frequency
- inverse document frequency (IDF)
- log likelihood ratio
- Google Ngram viewer
Friday 4 April
- 2 quick videos
- Neural Network approaches to machine learning
- lab
Week 13
Monday 7 April
Wednesday 9 April
- more on naive bayes w/ text
- computers read the web
Friday 11 April
- Read the Web – David Heller
Week 14
Monday 14 April
- FINAL EXAM
- clustering
- dog distances sorted (20xp)
- people.csv (30xp)
- rough clustering code: cluster.py
- dogs2.csv
- Graph Theory and mining the web: Benjamin Blalock.
Wednesday 16 April
- Fighting Fires with Data Mining – Annika Lewis
- Hands on clustering
- new version of clusterer
- Breakfast cereal data
- New Mexico zip code data
Friday 18 April
- Data Mining in the Medical Field – Allie Cropp
- k-means clustering
- kmeans.py
Week 15
Monday 21 April
- Spatial Data Mining – Sean Healy
- Clustering unstructured text
- clusterText.py
- tiny.zip (a trivial example)
- 100 xp to finish the clustering python script
Wednesday 23 April
- Clustering: a summary
- Melanie Mitchell on Complex Systems (check out her free online courses at the Santa Fe Institute!!)
- More on complexity – a playlist
Friday 25 April
- GraphLab – a presentation by Elizaebeth Greene and Erik Nosar