This paper reports on a series of studies focused on the geographical classification of Standard Arabic. The aim of these studies was to automatically classify a document based on the author’s country of origin. The studies examined documents from newspapers in five countries. We evaluated ten classification algorithms on this task. The best performing algorithms were bagging C4.5, neural network with back propagation, NBTree, and SMO with a polynomial kernel. These methods were over 99% accurate in geographically classifying the documents.
Abdelali, Ahmed, Steve Helmreich, and Ron Zacharski. 2009. Investigations on Standard Arabic Geographical Classification. Proceedings of the Computational Approaches to Arabic Script-based Languages Workshop, Ottawa, 26 August 2009. (pdf)