* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: A Data-Driven Methodology for Motivating a Set of Coherence Relations
Author: Alastair Knott
Homepage: http://www.cs.otago.ac.nz/staffpriv/alik/
Degree Awarded: University of Edinburgh , School of Informatics
Degree Date: 1996
Linguistic Subfield(s): Discourse Analysis
Director(s): Robert Dale
Chris Mellish

Abstract:

The notion that a text is coherent in virtue of the 'relations' that hold between its component spans currently forms the basis for an active research programme in discourse linguistics. 'Coherence relations' feature prominently in many theories of discourse structure, and have recently been used with considerable success in text generation systems. However, while the concept of coherence relations is now common currency for discourse theorists, there remains much confusion about them, and no standard set of relations has yet emerged.

The aim of this thesis is to contribute towards the development of a standard set of relations. We begin from an explicitly empirical conception of relations: they are taken to model a collection of psychological mechanisms operative during the tasks of reading and writing. This conception is fleshed out with reference to psychological theories of skilled task performance, and to Rosch's notion of the basic level of categorisation.

A methodology for investigating these mechanisms is then presented, which takes as its starting point a study of cue phrases---the sentence/clause connectives by which they are signalled. Although it is conventional to investigate psychological mechanisms by studying human behaviour, it is argued here that evidence for the constructs modelled by relations can be sought in an analysis of the linguistic resources available for marking them explicitly in text.

The methodology is based on two simple linguistic tests: the 'test for cue phrases' and the 'test for substitutability'. Both tests are functional in inspiration: the former test identifies a heterogenous class of phrases used for linking one portion of text to another; and the latter test is used to discover when a writer is willing to substitute one of these phrases for another. The tests are designed to capture the judgements of ordinary readers and writers, rather than the theoretical intuitions of specialised discourse analysts.
The test for cue phrases is used to analyse around 200 pages of naturally occurring text, from which a corpus of over 200 cue phrases is assembled. The substitutability test is then used to organise this corpus into a hierarchical taxonomy, representing the substitutability relationship between every pair of phrases.

The taxonomy of cue phrases lends itself neatly to a model of relations as feature-based constructs. Many cue phrases can be interpreted as signalling just some features of relations, rather than whole relations. Small extracts from the taxonomy can be used systematically to determine the alternative values of single features; complex relation definitions can then be formed by combining the values of many features.

The thesis delivers results on two levels. Firstly, it sets out a methodology for motivating a set of relation definitions, which rests on a systematic analysis of concrete linguistic data, and demands a minimum of theoretical assumptions. Also provided are the relation definitions which result from applying the methodology. The new definitions give an interesting picture of the variation that exists amongst cue phrases, and offer a number of innovative insights into text coherence.
Add a dissertation
Update dissertation
Page Updated: 29-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.