Ron Zacharski | Monthly Archives: December 2009

Over the Christmas break I have been looking at words in Standard Arabic that are more common in one region compared to another. This is a continuation of work I have been doing with Ahmed Abdelali and Steve Helmreich. Ahmed has collected a corpus of Standard Arabic texts from newspapers in Egypt, Sudan, Libya, Syria, and the UK. In previous work we looked at distinguishing texts from different regions using the frequency of common words (the equivalent of common English words such as at, on,and in). In this work over Christmas break, I was looking for the difference in the frequency of content words (similar to Amazon’s ’statistically improbably phrases’)–words that occur in texts more frequently than you would expect by chance. Continue reading →