Schedule
Week 1
25 August
- what is data mining
- course logistics
- a simple musical artist data set
27 August
- getting started with iPython notebooks:
- collaborative filtering
- nearest neighbor algorithm
- lab1 worksheet
Week 2
1 September
- Pearson Correlation Coefficient
- athletes.xls
- team worksheet
- data.xls
- Discussion of Chapter 2
- lab 2 worksheet
3 September
- RAT 1: Chapters 1 & 2
- lab time: lab 2
- homework 1 – due Sunday at 11:59pm.
- In the news (will show a sampling of these):
- Amazon ships before you order
- IBM Watson
- Low energy Bluetooth
- Imagine the Future of Aging
- The Birth of a Word
- Movie Lens cont’d
Week 3
8 September
- Item Based Filtering
- Adjusted Cosine Similarity
- Slope One
- Weighted Slope One Worksheet
- Class Music Ratings
- Notebook 3 – Slope One
- work on the wine dataset
-
- Movie Lens revisited.
10 September
- Slope One Worksheet – solution
- finish up tasks
Week 4
15 September
- Chapter 4: Intro
- mostly a lab day
17 September
- Chapter 4. Normalization
- Titanic Task
Week 5
22 September
- RAT #2: Chapters 3 & 4
- Entropy
24 September
Week 6
29 September
1 October
- Still Strict No Talking on my part.
- Titanic lab
- Entropy and Decision Tree continued.
- Beta version of Entropy Notebook
Week 7
6 October
- still no talking
- New Piazza Policy
- Titanic – sinking Intensive Lab
8 October
- Brand Spanking New version of Decision Tree/ Entropy notebook (150xp possible)
- xp * 2 if demoed in class on 8 October (> 300 xp possible)
- checkpoint 11 xp doubled if demoed before 16 October
- Decision Trees
- Decision Tree Notebook continued
- Covering Rules
- dimensionality
Week 8
13 October – Break
15 October
- Finish Entropy / Decision Tree notebook
- Talk on Classifying text.
Week 9
20 October
- RAT 4: Chapter 5, entropy. decision trees, Chapter 6 through page 6-27.
- Introduction to Naive Bayes
22 October
- Lab Day
Week 10
27 October
29 October
- Naive Bayes and unstructured text
- demo book code with 20 newsgroup dataset 25xp
- demo with another dataset 75-125xp.
- Twitter Sentiment Analysis dataset
Week 11
3 November
- short talks on the following:
- lift charts
- dimension reduction
- measures of accuracy
- Bayes tasks for XP
5 November
- Final Exam Posted
- Information Retrieval
- Tokenization and Normalization
- term document incidence matrices
- inverted index (postings index)
Week 12
10 November
- Final Exam Part 2 Posted
- clustering
- dog breed Google Sheet
- dog distances sorted
- Team Task- hierarchical clustering
- k-means clustering
12 November
- k-means clustering
- Building Watson
- Watson After Jeopardy
- information extraction and named entity recognition
Week 13
17 November
- Final Project discussed.
- Named Entity Recognition cont’d
19 November
Week 14
24 November
- last day to sign up for presentations
- Final Exam – final version posted
- Peer XP
- Matrix Factorization
26 November – Thanksgiving
Week 15
1 Dec
- Presentations – Section 1:
- Read the Web – Brian Will
- Neural Networks – Joshua Mwanda
- Deep Learning — Anna Corley, Matt O’Brien
- The Future of Coaching Sports – Chelsea Irizarry, Lindsey Green, Brittany Raze
- Dota 2 – Sepehr Sobhani
- Presentations Section 2:
3 Dec – last day
- Final Project Presentations