LINGUIST List 14.1476

Thu May 22 2003

Diss: Pragmatics/Comp: Cresswell: "Syntactic form..."

Editor for this issue: Naomi Fox <>


  1. creswell, Syntactic form and discourse function...

Message 1: Syntactic form and discourse function...

Date: Wed, 21 May 2003 15:13:16 +0000
From: creswell <>
Subject: Syntactic form and discourse function...

Institution: University of Pennsylvania
Program: Department of Linguistics
Dissertation Status: Completed
Degree Date: 2003

Author: Cassandre Yvonne Creswell 

Dissertation Title: Syntactic form and discourse function in natural
language generation

Dissertation URL:

Linguistic Field: Pragmatics
		 Computational Linguistics 

Subject Language: English (code: ENG)

Dissertation Director 1: Ellen F. Prince
Dissertation Director 2: Aravind K. Joshi

Dissertation Abstract: 

Previous research has shown that certain discourse conditions are
necessary for the felicitous use of four non-canonical syntactic
constructions in English, topicalizations, left-dislocations,
wh-clefts, and it-clefts. However, the distribution of these forms
does not correlate one-to-one with the presence of these necessary
conditions. Speakers must choose to use these constructions for other
reasons. Additionally, a natural language generation algorithm that
selects these statistically-rare forms based only on these conditions
will overgenerate. If it selects clausal word order based only on
frequency, however, these forms will never be selected or will be used
in meaningless ways. The purpose of this dissertation is to devise a
more complete model of when human speakers generate these
constructions in order to further understanding of syntactic form
selection and to better characterize these forms' conditions of use
for purposes of NLG. The model of syntactic choice presented
explicitly ties the goals of the communicative agent to the linguistic
forms selected to achieve those goals. Three types of communicative
goals that speakers achieve through the use of non-canonical syntax
are argued for (1) attention marking, (2) discourse relation, and (3)
information-structure focus disambiguation. The evidence supporting
the model is based on naturally-occurring tokens from a corpus of
spontaneous oral discourse. This same corpus, annotated with
low-level properties of the discourse context surrounding utterances
with non-canonical word order, is then used to train a statistical
model that can approximate some aspects of the theoretical model. The
statistical model supports the claim that communicative goals of
signaling discourse relations do correlate significantly with the use
of particular non-canonical forms. The statistical model is also used
as a probabilistic classifier, which could be utilized as a stochastic
method for selecting syntactic form based on discourse context as part
of a natural language generation system. The probabilistic classifier
shows improvement over a naive classifier when applied to training
data. The probabilistic classifier is a first attempt to utilize more
than just frequency counts as a basis for syntactic form selection and
instead incorporate aspects of the semantic content of surrounding
discourse context as a basis for using a particular form.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue