Class Schedule

Week 1 – 31 August

Intro to Computational Linguistics
Intro to course logistics
Stream – collaborative decision
Text Analysis for Digital humanities (see Resources for Digital Humanities)

Week 2 – 7 September

Discussion of the following articles:

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete by Chris Anderson
Visualizing Big Data: Bar Charts for Words by Mark Horowitz
Scan This Book! by Kevin Kelly
Google’s Moon Shot by Jeffrey Toobin
Quantitative Analysis of Culture Using Millions of Digitized Books
Matthew Jocker’s critique of the Google’s ngram analyzer

Text Analysis Continued
Discussion of results with tools
Introduction to Morphology (PDF of slides)
Introduction to Finite State Transducers

Week 3 – 14 September

Southern Brazilian Portuguese Pronunciation

How Natural Language Processing is Changing – a video
Entropy and Entropy Worksheet on Rotokas.
Discussion of the debate between Peter Norvig, Director of Research at Google and Noam Chomsky.

On Chomsky and the Two Cultures of Statistical Learning by Peter Norvig. (everyone read) First responders: Brian, Shannon
Norvig vs. Chomsky and the Fight for the Future of AI by Kevin Gold. First Responder: Samantha

Discussion of the following article:

Unsupervised Learning of the Morphology of a Natural Language by John Goldsmith. First Responder: Mary

ToDo: Download and install Linguistica and test it on how well it learns morphology of a corpus of English that you find and a corpus of another language.
Computational Morphology continued

Week 4 – 21 September

Discussion of Morphology long-term HW – Finite State Transducers and Morphology (link to write-up)
Discussion of Joint Entropy of Rotokas and other langs (Please compute this before class)
Southern Brazilian Portuguese FST (15 min)
English Plurals FST (15 min)
Unsupervised Morphology (20 min)
Demo – Naive Bayes Classifier (zip file of classifier)
Geographic Classification of Arabic.
HW for next week (can do with a partner):

try the classifier on your texts.
try Linguistica on a Foreign Language Corpus. Analyze results.
skim Parts of Speech and Basic Syntax chapters of book mentioned in Week 5.

Week 5 – 28 September

What we learned from 5 million books – TedTalk
Wrap up and discussion

Entropy – interested in hearing reports and impressions from people
Southern Brazilian Portuguese
English Plurals

Linguistica – discussion
Introduction to Lexc – Part of the Xerox Toolkit (pdf of relevant chapter in Finite State Morphology (51MB dl))

Esperanto Worksheet (2 tasks – first write FSA then FST)

Linguists in class:

Parts of Speech (any intro to Linguistics book or through 5.3 of this chapter) user:compling/pw – a 1337 version of chomsky (only 1 character changed) Mary
Basic Syntax (any intro to Linguistics book or chapter 12 of this) Shannon
Phrase Structure Grammar and parsing (chapter 13 of this)

Week 6 – 5 October

WordSeer – How Natural Language Processing is Changing Research
Eserpanto Worksheet- finish up FSA and FST
Talk and demo of Porter Stemmer (info at Martin Porter’s Porter Stemming Algorithm webpage)
Video of Peter Norvig talking about the effectiveness of data. Either

Great Programming Shootout

classify2.py

Parsing Intro.

Week 7 – 12 October

Great Classifier ShootOut

Parsing

Grammars, Parsers, & the CYK algorithm (pdf of slides)
BUBS parser (please download)

try it first with the included Berkeley grammar
the simple2.gr.txt file
rule format discussion
parsing worksheet
try entering the Wumpus grammar

Week 8 – 19 October

Game plan for rest of semester
Introduction to the Final Project (pdf of slides)
Great Classifier ShootOut
BUBs parser lab
Statistical Machine Translation I: Word Alignment Models (Kevin Knight’s A Statistical MT Tutorial Workbook (rtf)
Language Model Optional Project (description)

texts to unscramble: Walden-scrambled (for testing), Enron Email

Week 9 – 26 October

Project elevator pitch
15-20 minute presentation Ferrucci et al. Building Watson: An Overview of the DeepQA Project, AI Magazine 2010: Ryan and Shannon!
15-20 minute presentation AskMSR question answering system (report on both AskMSR: Question Answering Using the Worldwide Web and An Analysis of the AskMSR Question-Answering System) Ashby & Valerie!
time to co-ordinate project
Statistical Machine Translation II: Statistical Alignment Models (chapter 25 of this draft)
Videos shown in class:

The Bears discuss the state of the machine learning labor market.
Machine learning of language – a lecture by Christopher Manning.

Week 10 – 2 November

Project update reports
Discussion of Manning Video
Information Extraction presentation (chapter 22 intro through 22.1 Named Entity Extraction of this draft same pw as above) student presenter 10 minutes. Gray & Eric!
Information Extraction presentation (chapter 22 — section 22.2 of this draft) Joe F and Jacob B!
Introduction to Information Extraction Technology (old paper but still relevant) Amy Olson & Samantha Whay
1/2 hr. to co-ordinate project
Odds and Ends re. Satistical MT
Evaluating MT systems
Hands-on MT lab (going through basic tutorial (sec 2.1) of Moses Statistical Machine Translation System User Manual.

Week 11 – 9 November

Sprint 1 demo (5 – 10 minutes each team)
Intro to Computational Semantics
Computers versus Common Sense (The Cyc Project) Doug Lenat vido
Hands-on MT lab. (handout)

Week 12 – 16 November

Sprint 2 demo (5 – 10 minutes each team)
Computers versus Common Sense (The Cyc Project) Doug Lenat
Semantic Role Labeling (if people need it, can make this a student presentation, let me know)

Week 13 – 23 November

Thanksgiving Break

Week 14 – 30 November

Cyc Discussion
OpenCyc
Open Cog Project (talk by Ben Goertzel)
Formal Computational Semantics
Soft AI: what we have now: Powerset Demo Video
Hard AI: Embodied Conversational Agents – Justine Cassell’s research

PBS Scientific American Frontiers (Friendly Characters)
excerpts from The Human at the Heart of our Work (around the 26 minute mark for code switching and 48 min. for work on autism)
Apple Knowledge Navigator revisited (vs. annoying Microsoft Paper Clip)

presentation: The MIT START question answering system. Draw presentation from several papers at http://groups.csail.mit.edu/infolab/publications/ and give demo. (2 presenters)

Week 15 – 7 December

Final Sprint Demo – 20 minute presentations