Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

E-mail this page

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Dissertation Information

Title: Syntactic Form and Discourse Function in Natural Language Generation Add Dissertation
Author: Cassandre Creswell Update Dissertation
Email: click here to access email
Institution: University of Pennsylvania, Department of Linguistics
Completed in: 2003
Linguistic Subfield(s): Computational Linguistics; Pragmatics;
Subject Language(s): English
Director(s): Aravind Joshi
Ellen Prince

Abstract: Previous research has shown that certain discourse conditions are necessary for the felicitous use of four non-canonical syntactic constructions in English, topicalizations, left-dislocations, wh-clefts, and it-clefts. However, the distribution of these forms does not correlate one-to-one with the presence of these necessary conditions. Speakers must choose to use these constructions for other reasons. Additionally, a natural language generation algorithm that selects these statistically-rare forms based only on these conditions will overgenerate. If it selects clausal word order based only on frequency, however, these forms will never be selected or will be used in meaningless ways. The purpose of this dissertation is to devise a more complete model of when human speakers generate these constructions in order to further understanding of syntactic form selection and to better characterize these forms' conditions of use for purposes of NLG. The model of syntactic choice presented explicitly ties the goals of the communicative agent to the linguistic forms selected to achieve those goals. Three types of communicative goals that speakers achieve
through the use of non-canonical syntax are argued for (1) attention marking, (2) discourse relation, and (3) information-structure focus disambiguation. The evidence supporting the model is based on naturally-occurring tokens from a corpus of spontaneous oral discourse. This same corpus, annotated with low-level properties of the discourse context surrounding utterances with non-canonical word order, is then used to train a statistical model that can approximate some aspects of the theoretical model. The statistical model supports the claim that communicative goals of signaling discourse relations do correlate significantly with the use of particular non-canonical forms. The statistical model is also used as a probabilistic classifier, which could be utilized as a stochastic method for selecting syntactic form based on discourse context as part of a natural language generation system. The probabilistic classifier shows improvement over a naive classifier when applied to training data. The probabilistic classifier is a first attempt to utilize more than just frequency counts as a basis for syntactic form selection and instead incorporate aspects of the semantic content of surrounding discourse context as a basis for using a particular form.