Ron Zacharski | Identifying the source of Arabic documents

For the last few months I’ve been working on methods to identify the source of Arabic documents. For example, given a document I would like to identify where it was written (Syria, Libya, Sudan, etc). This task is part of a larger project to identify cyberterrorist threats involving New Mexico Tech, New Mexico State University, and the University of Mary Washington. I have over 4000 Arabic documents from 5 different newspapers. Most of the documents are around 15-25k in size. My method uses the sequential minimal optimization algorithm to train a support vector machine. I have been evaluating the approach using 10 fold cross validation and have been getting over 99% classification accuracy. I am currently working on writing several papers on this. As soon as I have a paper accepted I will post it here.