In many text analysis tasks it is common to remove frequently occurring words as part of the pre-processing step prior to analysis. While the removal of frequent words is correct for many text analysis tasks, it is not correct for all tasks. There are many analysis tasks where frequent words play a crucial role. In this paper we examine the use of frequent words to geographically classify Arabic news stories
Zacharski, Ron; Ahmed Abdelali; Stephen Helmreich; and Jim Cowie. 2009. Linguistic Dumpster Diving: Geographical Classification of Arabic Text. Proceedings of the Chicago Colloquia on Digital Humanities and Computer Science. (pdf)