Class Schedule
Week 1 – 31 August
- Intro to Computational Linguistics
- Intro to course logistics
- Stream – collaborative decision
- Text Analysis for Digital humanities (see Resources for Digital Humanities)
Week 2 – 7 September
- Discussion of the following articles:
- The End of Theory: The Data Deluge Makes the Scientific Method Obsolete by Chris Anderson
- Visualizing Big Data: Bar Charts for Words by Mark Horowitz
- Scan This Book! by Kevin Kelly
- Google’s Moon Shot by Jeffrey Toobin
- Quantitative Analysis of Culture Using Millions of Digitized Books
- Matthew Jocker’s critique of the Google’s ngram analyzer
- Text Analysis Continued
- Discussion of results with tools
- Introduction to Morphology (PDF of slides)
- Introduction to Finite State Transducers
Week 3 – 14 September
- Southern Brazilian Portuguese Pronunciation
- How Natural Language Processing is Changing – a video
- Entropy and Entropy Worksheet on Rotokas.
- Discussion of the debate between Peter Norvig, Director of Research at Google and Noam Chomsky.
- On Chomsky and the Two Cultures of Statistical Learning by Peter Norvig. (everyone read) First responders: Brian, Shannon
- Norvig vs. Chomsky and the Fight for the Future of AI by Kevin Gold. First Responder: Samantha
- Discussion of the following article:
- Unsupervised Learning of the Morphology of a Natural Language by John Goldsmith. First Responder: Mary
- ToDo: Download and install Linguistica and test it on how well it learns morphology of a corpus of English that you find and a corpus of another language.
- Computational Morphology continued
Week 4 – 21 September
- Discussion of Morphology long-term HW – Finite State Transducers and Morphology (link to write-up)
- Discussion of Joint Entropy of Rotokas and other langs (Please compute this before class)
- Southern Brazilian Portuguese FST (15 min)
- English Plurals FST (15 min)
- Unsupervised Morphology (20 min)
- Demo – Naive Bayes Classifier (zip file of classifier)
- Geographic Classification of Arabic.
- HW for next week (can do with a partner):
- try the classifier on your texts.
- try Linguistica on a Foreign Language Corpus. Analyze results.
- skim Parts of Speech and Basic Syntax chapters of book mentioned in Week 5.
Week 5 – 28 September
- What we learned from 5 million books – TedTalk
- Wrap up and discussion
- Entropy – interested in hearing reports and impressions from people
- Southern Brazilian Portuguese
- English Plurals
- Linguistica – discussion
- Introduction to Lexc – Part of the Xerox Toolkit (pdf of relevant chapter in Finite State Morphology (51MB dl))
- Esperanto Worksheet (2 tasks – first write FSA then FST)
- Linguists in class:
- Parts of Speech (any intro to Linguistics book or through 5.3 of this chapter) user:compling/pw – a 1337 version of chomsky (only 1 character changed) Mary
- Basic Syntax (any intro to Linguistics book or chapter 12 of this) Shannon
- Phrase Structure Grammar and parsing (chapter 13 of this)
Week 6 – 5 October
- WordSeer – How Natural Language Processing is Changing Research
- Eserpanto Worksheet- finish up FSA and FST
- Talk and demo of Porter Stemmer (info at Martin Porter’s Porter Stemming Algorithm webpage)
- Video of Peter Norvig talking about the effectiveness of data. Either
- Great Programming Shootout
- Parsing Intro.
Week 7 – 12 October
- Great Classifier ShootOut
- Parsing
- Grammars, Parsers, & the CYK algorithm (pdf of slides)
- BUBS parser (please download)
- try it first with the included Berkeley grammar
- the simple2.gr.txt file
- rule format discussion
- parsing worksheet
- try entering the Wumpus grammar
Week 8 – 19 October
- Game plan for rest of semester
- Introduction to the Final Project (pdf of slides)
- Great Classifier ShootOut
- BUBs parser lab
- Statistical Machine Translation I: Word Alignment Models (Kevin Knight’s A Statistical MT Tutorial Workbook (rtf)
- Language Model Optional Project (description)
- texts to unscramble: Walden-scrambled (for testing), Enron Email
Week 9 – 26 October
- Project elevator pitch
- 15-20 minute presentation Ferrucci et al. Building Watson: An Overview of the DeepQA Project, AI Magazine 2010: Ryan and Shannon!
- 15-20 minute presentation AskMSR question answering system (report on both AskMSR: Question Answering Using the Worldwide Web and An Analysis of the AskMSR Question-Answering System) Ashby & Valerie!
- time to co-ordinate project
- Statistical Machine Translation II: Statistical Alignment Models (chapter 25 of this draft)
- Videos shown in class:
- The Bears discuss the state of the machine learning labor market.
- Machine learning of language – a lecture by Christopher Manning.
Week 10 – 2 November
- Project update reports
- Discussion of Manning Video
- Information Extraction presentation (chapter 22 intro through 22.1 Named Entity Extraction of this draft same pw as above) student presenter 10 minutes. Gray & Eric!
- Information Extraction presentation (chapter 22 — section 22.2 of this draft) Joe F and Jacob B!
- Introduction to Information Extraction Technology (old paper but still relevant) Amy Olson & Samantha Whay
- 1/2 hr. to co-ordinate project
- Odds and Ends re. Satistical MT
- Evaluating MT systems
- Hands-on MT lab (going through basic tutorial (sec 2.1) of Moses Statistical Machine Translation System User Manual.
Week 11 – 9 November
- Sprint 1 demo (5 – 10 minutes each team)
- Intro to Computational Semantics
- Computers versus Common Sense (The Cyc Project) Doug Lenat vido
- Hands-on MT lab. (handout)
Week 12 – 16 November
- Sprint 2 demo (5 – 10 minutes each team)
- Computers versus Common Sense (The Cyc Project) Doug Lenat
- Semantic Role Labeling (if people need it, can make this a student presentation, let me know)
Week 13 – 23 November
- Thanksgiving Break
Week 14 – 30 November
- Cyc Discussion
- OpenCyc
- Open Cog Project (talk by Ben Goertzel)
- Formal Computational Semantics
- Soft AI: what we have now: Powerset Demo Video
- Hard AI: Embodied Conversational Agents – Justine Cassell’s research
- PBS Scientific American Frontiers (Friendly Characters)
- excerpts from The Human at the Heart of our Work (around the 26 minute mark for code switching and 48 min. for work on autism)
- Apple Knowledge Navigator revisited (vs. annoying Microsoft Paper Clip)
- presentation: The MIT START question answering system. Draw presentation from several papers at http://groups.csail.mit.edu/infolab/publications/ and give demo. (2 presenters)
Week 15 – 7 December
- Final Sprint Demo – 20 minute presentations