Jeanette Gundel (University of Minnesota), Nancy Hedberg (Simon Fraser University) and I just submitted a paper to the new journal topiCS (topics in Cognitive Science). This pretty much consumed my entire spring break. The title of the paper is Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions. Here’s the abstract. Continue reading
Monthly Archives: March 2010
Linguistic Dumpster Diving: Geographical Classification of Arabic Text
In many text analysis tasks it is common to remove frequently occurring words as part of the pre-processing step prior to analysis. While the removal of frequent words is correct for many text analysis tasks, it is not correct for all tasks. There are many analysis tasks where frequent words play a crucial role. In this paper we examine the use of frequent words to geographically classify Arabic news stories
Zacharski, Ron; Ahmed Abdelali; Stephen Helmreich; and Jim Cowie. 2009. Linguistic Dumpster Diving: Geographical Classification of Arabic Text. Proceedings of the Chicago Colloquia on Digital Humanities and Computer Science. (pdf)
Investigations on Standard Arabic Geographical Classification
This paper reports on a series of studies focused on the geographical classification of Standard Arabic. The aim of these studies was to automatically classify a document based on the author’s country of origin. The studies examined documents from newspapers in five countries. We evaluated ten classification algorithms on this task. The best performing algorithms were bagging C4.5, neural network with back propagation, NBTree, and SMO with a polynomial kernel. These methods were over 99% accurate in geographically classifying the documents.
Abdelali, Ahmed, Steve Helmreich, and Ron Zacharski. 2009. Investigations on Standard Arabic Geographical Classification. Proceedings of the Computational Approaches to Arabic Script-based Languages Workshop, Ottawa, 26 August 2009. (pdf)