Publications

The following is a nearly complete list of my publications. To view publications by topic, use the menu on the left.

Underspecification of Cognitive Status in Reference Production

Within the Givenness Hierarchy framework of Gundel, Hedberg, & Zacharski (1993), lexical items included in referring forms are assumed to conventionally encode two kinds of information: conceptual information about the speaker’s intended referent and procedural information about the assumed cognitive status of that referent in the mind of the addressee, the latter encoded by various determiners and pronouns. The current work focuses on effects of underspecification of cognitive status, establishing that, while salience and degree of accessibility play an important role in reference processing, the Givenness Hierarchy itself is not a hierarchy of degrees of salience/accessibility, contrary to what has often been assumed. We thus show that the framework is able to account for a number of experimental results in the literature without making additional assumptions about form-specific constraints associated with different referring forms.

Gundel, Jeanette K., Nancy Hedberg, and Ron Zacharski. Forthcoming. Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions. Topics in Cognitive Science (pdf)

Linguistic Dumpster Diving: Geographical Classification of Arabic Text

In many text analysis tasks it is common to remove frequently occurring words as part of the pre-processing step  prior to analysis.  While the removal of frequent words is  correct for many text analysis tasks, it is not correct for all tasks. There are many analysis tasks where frequent  words play a crucial role. In this paper we examine the use of frequent words to geographically classify Arabic news stories

Zacharski, Ron; Ahmed Abdelali; Stephen Helmreich; and Jim Cowie. 2009. Linguistic Dumpster Diving: Geographical Classification of Arabic Text. Proceedings of the Chicago Colloquia on Digital Humanities and Computer Science. (pdf)

Investigations on Standard Arabic Geographical Classification

This paper reports on a series of studies focused on the geographical classification of Standard Arabic. The aim of these studies was to automatically classify a document based on the author’s country of origin. The studies examined documents from newspapers in five countries. We evaluated ten classification algorithms on this task. The best performing algorithms were bagging C4.5, neural network with back propagation, NBTree, and SMO with a polynomial kernel. These methods were over 99% accurate in geographically classifying the documents.

Abdelali, Ahmed, Steve Helmreich, and Ron Zacharski. 2009. Investigations on Standard Arabic Geographical Classification. Proceedings of the Computational Approaches to Arabic Script-based Languages Workshop, Ottawa, 26 August 2009. (pdf)

User choice as an evaluation metric for cross language IM

A method for evaluating MT performance embedded in Cross-Language Instant Messaging (CLIM) systems is presented. A web interface that provided concurrent real- time translation for instant messaging from multiple MT services was developed and used by paid participants to collaborate on a photo identification task. The method showed a task performance benefit due to the availability of multiple translation alternatives.

Ogden, William, Ron Zacharski, Sieun An and Yuki Ishikawa. 2009. User choice as an evaluation metric for web translation services in cross language instant messaging applications. Proceedings of the Machine Translation Summit XII. Ottawa, Canada. (pdf)

Directly and Indirectly Anaphoric Pronouns

This paper reports on a study of pronouns this, that, and it in articles in the New York Times, testing the following hypotheses: (1) the pronoun it requires its referent to be in the addressee’s focus of attention; demonstrative pronouns only require activation; (2) the anaphoric relation between it and its antecedent tends to be direct (co-referential); the relation between a demonstrative pronoun and its antecedent tends to be indirect (non-coreferential).

Gundel, Jeanette, Nancy Hedberg, and Ron Zacharski. 2007. Directly and Indirectly Anaphoric Demonstrative and Personal Pronouns in Newspaper Articles.  Proceedings of the Sixth Annual Discourse Anaphora and Anaphora Resolution Colloquium (pdf)

The Grammar-Pragmatics Interface

This book, edited by Nancy Hedberg and me, came out mid 2007. The book celebrates the life and work of Jeanette K. Gundel with papers by her friends and colleagues. The book has a strong theme: the grammar-pragmatics interface, and it is oriented around Jeanette K. Gundel’s ideas on that theme. The first section is centrally concerned with the topic-comment distinction. The second section discusses from aspects of reference directly related to the Givenness Hierarchy The third section moves from describing a close relation between register and form of referring expression through the discourse grammar of apologies to the crucially social discussion of co-constructional utterances that are composed of hearer plus speaker contributions to the discourse.

Hedberg, Nancy, and Ron Zacharski. 2007. The Grammer-Pragmatics Interface: Essays in Honor of Jeanette K. Gundel. John Benjamins Publishing. [Amazon] (The table of contents and the introductory chapter are available in pdf)

Guarani: A case study in resource development for quick ramp­up MT

In this paper we describe a set of processes for the acquisition of re­ sources for quick ramp­up machine translation (MT) from any language lacking significant machine tracta­ ble resources into English, using the Paraguayan indigenous lan­ guage Guarani as well as Amharic and Chechen, as examples.

Abdelali, Ahmed; James Cowie; Steve Helmreich; Wanying Jin; Maria Pilar Milagros; Bill Ogden; Hamid Mansouri Rad; and Ron Zacharski. 2006. Guarani: A case study in resource development for quick ramp-up MT. Proceedings of the Seventh Biennial Conference of the Association for Machine Translation in the Americas. Boston, MA. (pdf)

Data-Centric Computing with the Netezza Architecture

While relational databases have become critically important in business applications and web services, they have played a relatively

minor role in scientific computing, which has generally been concerned with modeling and simulation activities. However, massively parallel database architectures are beginning offer the ability to quickly search through megabytes of data with hundred-fold or even thousand-fold speedup over server-based architectures. These new machines may enable an entirely new class of algorithms for scientific applications, especially when the fundamental computation involves searching through abstract graphs. Three examples are examined and results are reported for implementations on a novel, massively parallel database computer, which enabled very high performance. Promising results from (1) computation of bibliographic couplings, (2) graph searches for sub-circuit motifs within integrated circuit Metrists, and (3) a new approach to word sense disambiguation in natural language processing, a11 suggest that the computational science community might be able to make good use of these new database machines

Davidson, George S., Kevin W. Boyack, Ron Zacharski, Stephen Helmreich, and Jim R. Cowie. 2006. Data-Centric Computing with the Netezza Architecture. Sandia Report SAND2006-1853. (pdf)

The Role of Ontologies in a Linguistic Knowledge Acquisition Task

This paper discusses the role of ontologies in a knowledge elicitation component of a natural language processing system. The system is intended to assist in the rapid development and deployment of a machine translation system from any so–called ‘low–density’ language (one lacking significant machine–tractable resources) into English. The elicitation component, called BOAS, is intended to guide non–expert informants to provide linguistic information in sufficient detail to automatically generate a machine translation system.

Helmreich, Steve, and Ron Zacharski. 2005. The role of ontologies in a linguistic knowledge acquisition task. Proceedings of The Electronic Metastructure for Endangered Languages Data Workshop on Linguistic Ontologies and Data Categories for Language Resources. Cambridge, MA. July 1-3, 2005. (pdf)

Pronouns without NP Antecedents

Pronouns without explicit noun phrase antecedents pose a problem for any theory of reference resolution. We report here on an empirical study of such pronouns in the Santa Barbara Corpus of Spoken American English, a corpus of spontaneous, casual conversation. In this paper we focus on some problematic subclasses of pronouns which could be analyzed as either referring to entities of various degrees of abstractness that were introduced by or implied in previous discourse, or as non-referential, including pleonastic.

Gundel, Jeanette, Nancy Hedberg, and Ron Zacharski. 2005. Pronouns without NP Antecedents: How do we know when a pronoun is referential. Anaphora Processing: Linguistic, Cognitive and Computational Modelling, ed. by Antonio Branco, Tony McEnery, and Ruslan Mitkov. John Benjamins, 351-364. (pdf)