ron.zacharski@gmail.com

Ron Zacharski

Language recognition for mono- and multi-lingual documents

Posted on March 5, 1999 by admin Posted in Machine Translation, News

In this paper we describe language recognition algorithms for mono- and multi-lingual documents that are based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We compare the monolingual algorithm to those suggested by other researchers. This comparison suggests that this algorithm significantly outperforms commonly used language recognition algorithms. We then describe the multilingual algorithm, which allows for segmenting a multilingual document into single language chunks and identifying the languages of those chunks.

Cowie, Jim, Yevgeny Ludovik, and Ron Zacharski. 1999. Language recognition for mono- and multi-lingual documents. Proceedings of the Vextal Conference, 209-214. Venice, November 22-24, 1999. 209-214. (pdf)

« Multilingual Document Language Recognition for Creating Corpora

MT and topic-based techniques to enhance speech recognition »