Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


Cognitive Literary Science

Edited by Michael Burke and Emily T. Troscianko

Cognitive Literary Science "Brings together researchers in cognitive-scientific fields and with literary backgrounds for a comprehensive look at cognition and literature."

New from Cambridge University Press!


Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."

Review of  Current and New Directions in Discourse and Dialogue

Reviewer: Roser Morante
Book Title: Current and New Directions in Discourse and Dialogue
Book Author: Jan van Kuppevelt Ronnie W. Smith
Publisher: Kluwer
Linguistic Field(s): Computational Linguistics
Issue Number: 15.1169

Discuss this Review
Help on Posting
Date: Thu, 08 Apr 2004 17:25:39 +0200
From: Roser Morante
Subject: Current and New Directions in Discourse and Dialogue

EDITOR: van Kuppevelt, Jan; Smith, Ronnie W.
TITLE: Current and New Directions in Discourse and Dialogue
SERIES: Text, Speech and Language Technology 22
PUBLISHER: Kluwer Academic Publishers
YEAR: 2004

Roser Morante, Computational Linguistics and AI section,
Faculty of Arts, University of Tilburg.

"Current and New Directions in Discourse and Dialogue" is a collection
of sixteen papers. Twelve of them are extended versions of the papers
presented in the Second SIGdial Workshop on Discourse and Dialogue held
in September 2001 in Aalborg, Denmark. The rest are invited papers. As
the editors point out, the three main themes addressed in the book are:
(i) corpus annotation and analysis: chapters 1, 3, 5; (ii)
methodologies for construction of dialogue systems: chapters 2, 6, 10,
12, 15; and (iii) perspectives on various theoretical issues: chapters
4 (communicative intention), 7 (human- computer versus human-human
dialogues), 8 (context-based generation), 9 and 11 (clarification
requests), 13 (conversational implicatures), 14 (modeling of discourse
structure), 16 (role of interruptions).

The fact that the book gathers papers on several research areas related
to discourse and dialogue makes it interesting not only for researchers
working on specific areas, but also for those who would like to have a
multidimensional view of the current dialogue and discourse oriented
research by reading some representative articles. However, the reader
should not expect to find machine learning studies in dialogue

Chapter 1, "Annotations and tools for an activity based spoken language
corpus", written by Jens Allwood, Leif Grönqvist, Elisabeth Ahlsén, and
Magnus Gumnarson, describes the Spoken Language Corpus of Swedish
developed in the Department of Linguistics at Göteborg Universtiy
(GSLC) and the various types of tools and analysis that have been
developed for work on this corpus. The corpus contains 1.3 million
words of naturalistic spoken language data, its original feature being
the number of social activities recorded (25). The tools described are
the following: TransTool, a tool for transcribing spoken language
following a transcription standard (GTS 6.2, MSO6); Corpus Browser, a
web interface that makes it possible to search for words, word
combinations and phrases; Tractor, a coding tool to create new coding
schemas and annotate transcriptions; a toolbox that allows to visualize
coding schemas and coding directly in the transcription as a FrameMaker
document; TraSA, a tool that calculates 30 statistical measurements;
SyncTool, a tool that synchronizes transcriptions with digitized
audio/video recordings; and MultiTool, which is a general tool for
linguistic annotation and transcribing of dialogs, as well as browsing,
searching and counting. The tools allow synchronizing views pertaining
to the same point in time in order to show the same sequence from
different points of view. The corpus has been analyzed quantitatively
and quantitatively. As for the quantitative analysis, a set of
automatically derivable properties of the corpus (volume, ratios,
special descriptors, lemma, POS, collocations, frequency lists,
sequences of part of speech and similarities) has been defined using
the information provided by the transcriptions. The qualitative
analysis of the corpus has resulted in the development of coding
schemas, which include: social activity and communicative act related
coding, communication management related coding, grammatical coding and
semantic coding.

Chapter 2, "Using direct variant transduction for rapid development of
natural spoken interfaces", is written by Hiyan Alshawi and Shona
Douglas. The authors present a new approach to constructing interactive
spoken interfaces, the Direct Variant Transduction (DVT), that relies
on specifying an application with examples and on classification and
pattern-matching techniques. The method aims at addressing two
bottlenecks in the development of an interface with natural spoken
language: coping with language variation and linking natural language
to appropriate actions in the application back-end. The authors first
outline the characteristics of the method, which adopts several
(i) applications are constructed using a relatively small number of
example inputs from which robust interpretation models are compiled
(ii) no intermediate semantic representations are needed;
(iii) confirmation queries posed by the system to the user are
constructed automatically from the examples;
(iv) dialog control should be simple to specify for simple
applications, while allowing the flexibility of delegating this control
to another module for more complex applications.

Two applications have been built using this method: one to access e-
mail and a call-routing application. The authors continue by describing
what needs to be provided by the application builder. An application
consists of a set of contexts. Each context provides the mapping
between user inputs and application actions that are meaningful in a
particular stage of interaction between the user and the system. Next
they explain variant expansion and the use of classifiers and matchers.
Their approach to handle language variation is twofold: first, they use
robust recognition, classification and matching techniques, instead of
rules. Second, they expand contexts to include variants not included in
the original set of examples provided by the application developer.
After that they describe their approach to specifying dialogue control.
Finally, the authors present quantitative data that, as they point out,
indicate that a very small number of training examples can provide
useful performance in a call routing application. The results suggest
that the DVT method is a viable option for constructing spoken language
applications without specialized expertise.

Chapter 3, "An interface for annotating natural interactivity", is
written by Niels Ole Bernsen, Laila Dybkjaer, and Mykola Kolodnytsky.
The article focuses on spoken dialogue data annotation. The goal of
their research is to build a user-friendly general-purpose tool for
coding natural interactive communicative behavior in the framework of
the NITE project. The authors start by pointing out the need to have a
tool, which enables cross-level and cross-modality coding for all the
levels of analysis involved in natural interactivity, since this kind
of tool is a key-factor in the development of high quality annotated
data and coding schemes. They continue by presenting a review of
existing natural interactivity coding tools and by introducing the NITE
project coding tool specification. Next they describe the NITE
WorkBench (NWB) annotation user interface. It aims to support
annotation of any kind of phenomena involved in natural interactive
communication. The main requirements of the tool are: supporting
working with annotation projects, including meta-data; enabling
flexible control of raw data files (audio and/or video); supporting
annotation of natural interactive communication at any analytical level
and across levels and modalities through the use of existing or new
coding schemes; enabling users to specify new coding schemes; and
enabling information extraction and analysis of annotated data.
Finally, they present the Audio-Visual Annotation Interface: the data
structure, where project files are used as organizers; the annotation
interface, and the coding scheme interface.

Chapter 4, "Managing communicative intentions with collaborative
problem solving", is written by Nate Blaylock, James Allen, and George
Ferguson. The work is done in the context of producing a conversational
agent. In this paper the authors concern themselves with the
intention/language interface level, where communicative intentions are
converted to and from a high-level semantic form. The paper proposes a
descriptive model of dialogue, based on collaborative problem solving,
which defines communicative intentions as attempts to modify a shared
collaborative problem-solving state between the user and the system.
Following the authors, modeling communicative intentions based on
collaborative problem solving allows the behavioral reasoning component
of a dialogue system to only worry about problem solving and not about
linguistic issues. The model covers a range of collaboration paradigms
and models dialogue involving planning and acting. The authors start by
describing previous work (models of collaborative planning and models
of dialogue). Next they introduce their model of dialogue, showing how
communicative intentions are defined. In the model they try to
enumerate the complete taxonomy of collaborative problem-solving
activities. They continue by exemplifying the model with several
dialogue examples and by discussing how the model is used in the TRIPS
dialogue system. Finally, they put forward some conclusions and future

Chapter 5, "Building a discourse-tagged corpus in the framework of
rhetorical structure theory", is written by Lynn Carlson, Daniel Marcu,
and Mary Ellen Okurowski. The authors describe their experience in
developing the Rhetorical Structure Theory (RST) corpus. Their goal was
to conduct large-scale implementation within the framework of a single
discourse theory. The corpus contains 385 documents of American English
selected from the Penn Treebank, which are hierarchically annotated in
the framework of Rhetorical Structure Theory. The corpus can be mined
in order to study discourse related phenomena. The discourse structure
is built by means of 78 rhetorical relations and three additional
relations to impose structure on the tree. The paper describes the
selection of theoretical approach, annotation methodology, training,
and quality of assurance. A main feature of the corpus is that it is
annotated at different levels: leaf-level, text-level, and mid-level
analysis, so that the mining of the corpus can be done at different
levels. The authors present three examples: comparison of discourse
structure at the leaf-level, comparisons of trees for different styles
of news reports at the text-level, and examination of relations at the
mid-level analysis.

Chapter 6, "An empirical study of speech recognition errors in human
computer dialogue", is written by Marc Caravazza. The author
investigates the impact of speech recognition errors on a fully
implemented dialogue prototype based on a speech acts formalism. The
system is a mixed-initiative conversational interface organized around
a human like character, with which the user communicates through speech
recognition. The software architecture is a pipeline comprising speech
recognition, parsing and dialogue. The main goal of parsing is to
produce a semantic structure from which speech acts can be identified.
The system operates in a bottom-up fashion, it does not include any
specific mechanism for error detection or explicit error handling. The
author reports findings on the consequences of speech recognition
errors on the identification of speech acts, and the conditions under
which the system can be robust to those errors. He provides an
empirical classification of system reaction to speech recognition
errors, and discusses methods for the specific evaluation of the
consequences of speech recognition errors. Caravazza proposes that the
method should include a combination of speech acts accuracy metrics and
concept efficiency metrics. He concludes by saying that there appears
to be no simple correlation between robustness to speech recognition
errors and the depth of parsing and interpretation, since there are a
number of factors that support the robustness of the system to speech
recognition errors, like for example the fact that the dialogue control
mechanism triggered by the speech act recognition can contribute to
repairing the consequences of speech recognition errors.

Chapter 7, "Comparing several aspects of human-computer and human-human
dialogues", is written by Christine Doran, John Aberdeen, Laurie
Damianos, and Lynette Hirschman. The authors present results of the
experiments carried out to compare human-human (HH) and human-computer
(HC) interaction in the context of the Communicator Travel task, a
DARPA-funded program. In order to begin an empirical exploration of how
certain aspects of the dialogue shed light on differences between HC
and HH communication, they annotated dialogues (20 HH and 40 HC) from
the air travel domain with several sets of tags: dialogue act (CSTAR
consortium tags), initiative (which participant has control at the end
of the turn), and unsolicited information (this only for HC dialogues).
The paper describes the data, the coding of the corpus and the
analysis. The authors point out some findings. In general the
conversation is more balanced between traveler and expert in the HH
setting, in terms of amount of speech, types of dialogue acts and
sharing initiative. As for initiative, in the HC data the experts
massively dominated in taking the initiative, whereas in the HH data,
users and expert shared the initiative relatively equitably. The
results of the experiments show that one of the most salient
characteristics of the HC data is that they contain many
misunderstandings of a sort that are nearly absent in the HH data. They
find that misunderstandings prevalent in the HC data can be classified
into three groups (hallucinations, mismatches and prompt after fills),
and that these misunderstandings can be detected through the
combination of a semantic annotation and an automatic algorithm.

Chapter 8, "Full paraphrase generation for fragments in dialogue", is
written by Christian Ebert, Shalom Lappin, Howard Gregory, and Nicolas
Nicolov. The authors start by pointing out that one major challenge for
any dialogue interpretation system is the proper treatment of
fragments. In this paper they show how to generate phrases for
fragments of dialogues with SHARDS, which is a system for the
resolution of fragments in a dialogue, based on a version of HPSG which
integrates the situation semantics-based theory of dialogue context
given in KOS. The generator uses a template-filler approach and it does
not do any deep generation from an underlying semantic representation.
Instead it reuses the results of the parse and interpretation process
of SHARDS to dynamically compute the templates, and then to update the
filter. The innovation of their approach is that they situate
generation within the context of dialogue interpretation, specifically
fragment resolution. In doing so they are able to eliminate much of the
indeterminacy that characterizes classical generation systems by
exploiting the rich syntactic and phonological information produced in
the course of dialogue interpretation. The paper starts with the
presentation of the system and the grammatical background. Next the
proposal for generating fragment paraphrases with templates is
explained, and, finally, the implementation of SHARDS and the
generation component are described.

Chapter 9, "Disentangling public from non-public meaning", is written
by Jonathan Ginzburg. The author starts by claiming that analyses of
interaction need to characterize not solely 'success conditions', but
also 'clarification potential'. In this paper the author illustrates
how characterising certain classes of Clarification Requests (CR) can
shed light on the problem of distinguishing publicly expressed
communicative effects from non-public ones. First he considers the very
productive and effective ways of producing CRs relating to the
grammatically governed content of an utterance. Then he turns to CRs
that pertain to the non-public intentions of a conversational
participant, like Whymeta. He demonstrates that Whymeta shows distinct
behaviour from CRs that pertain to grammatically governed content, the
most prominent feature being that whereas the latter are almost
invariably adjacent to the utterances whose clarification they seek,
non-adjacency is quite natural for Whymeta. This leads him to establish
the distinction between the notion of utterer's content and utterer's
plan. The author provides data to reinforce the distinction between
Utterer's Content and Utterer's Plan. He provides some background
notions from the KOS framework required for his formalization and
applies this to explicate Utterer's Content. Finally he considers a
previous analysis of Whymeta and develops his own analysis, which
involves viewing it as an instance of a metadiscoursive utterance,
instead of as a mechanism that clarifies a contextually instantiable
goals/plan parameter.

Chapter 10, "Adaptivity and response generation in a spoken dialogue
system", is written by Kristiina Jokinen and Graham Wilcock. The paper
addresses the issue of how to increase adaptivity in response
generation for a spoken dialogue system. Realization strategies for
dialogue responses depend on communicative confidence levels and
interaction management goals. They first discuss interaction models and
naturalness and give concrete examples from a spoken dialogue system in
which different forms of surface realization are required in order to
achieve interaction management goals. They continue by describing a
Java/XML-based generator which produces different realizations of
system responses based on agendas specified by the dialogue manager.
The way in which the generator chooses between the different
realizations is based on detailed specifications of the information
status of different concepts, given in an agenda by the dialogue
manager component. They then discuss how greater adaptivity can be
achieved by using a set of distinct generator agents, each of which is
specialized in its realization strategy. This allows a simpler design
of each generator agent while increasing the overall system adaptivity
to meet the requirements for flexible cooperation in incremental and
immediate interactive situations.

Chapter 11, "On the means of clarification in dialogue", is written by
Matthew Purver, Jonathan Ginzburg, and Patrick Healey. In this paper
they describe an attempt to exhaustively categorise Clarification
Requests (CR) forms and readings based on corpus work. CR can take
various forms and can be used to request various types of clarification
information, but have in common the fact that they are in some sense
utterance-anaphoric. Thus the corpus work has the additional aim of
identification of the maximum distance between a CR and the utterance
being clarified. The authors start by discussing previous work on CR.
Then they list the possible CR (non-reprise clarifications, reprise
sentences, reprise sluices, reprise fragments, gaps, gap fillers, and
conventional) and readings (clausal, constituent, lexical, and
corrections) that they identify from corpus analysis. They continue by
describing the analysis of the corpus. Finally, they discuss the
implications of their results for a possible HPSG analysis of
clarification requests and for an ongoing implementation of a
clarification-capable dialogue system.

Chapter 12, "Plug and play spoken dialogue processing", is written by
Manny Rayner, Johan Boye, Ian Lewin, and Genevieve Gorrell. The paper
contains a description of a spoken language dialogue system
architecture which supports plug and playable networks of objects. The
discussion centres around a concrete prototype system, CANTONA. The
main point of plug and play spoken language dialogue is that at any
given time the system's dialogue capabilities are determined by the set
of devices currently connected; adding new devices dynamically changes
its ability to recognise, understand, and respond to commands. The
authors first introduce the top- level components and the key
interfaces of the CANTONA Plug and Play demonstrator, where all
processing is rule-based. Next they describe the general architecture
considerations to achieving plug and play functionality in rule-based
systems, the rules and hierarchies, the plug and play response
generation and the speech recognition and parsing.

Chapter 13, "Conversational implicatures and communication theory, is
written by Robert van Rooy. This paper presents an account for
implicatures in terms of a mathematical theory of communicaiton.
Following the author, from a standard pragmatics perspective
conversational implicatures should be accounted for in terms of Grice's
maxims of conversation. Neo-Giceans seek to reduce those maxims to the
so-called Q and I-principles. In this paper the author argues that:
(i) there are major problems for reducing Gricean pragmatics to these
two principles, and
(ii) that, in fact, it is better to account for implicatures in terms
of the principles of (a) optimal relevance and (b) optimal coding.
To formulate these principles the author makes use of Shannon's
mathematical theory of communication.

Chapter 14, "Reconciling control and discourse structure", is written
by Susan E. Strayer, Peter A. Heeman, and Fan Yang. In this paper the
authors consider how control, in the sense of initiative, is managed in
task-oriented dialogues. They first describe previous work in discourse
structure and in control, and they present their coding of the corpus
(eight dialogues of the TRAINS corpus) with subdialogues and control
tags based on the DAMSL coding schema. Next they explore the
relationship between discourse structure and control: (i) they compare
control boundaries to subdialogue boundaries using recall and
precision; (ii) they look at control inside of discourse segments. Then
they explore how control can shift within a subdialogue and find two
types of contributions that a speaker can make in a discourse segment:
collaborative completions, in which the non-initiator helps the segment
initiator achieve their goal, and short contributions to the discourse
segment purpose. They find that collaborative completions and co-
contributions are exceptions to the general rule that control tends to
reside with the same speaker. Based on the results of their study they
propose that control is subordinate to the intentional structure.
Control is held by the segment initiator. They point out the
implications for dialogue management: a system only needs to model
intentional structure, from which control follows.

Chapter 15, "The information state approach to dialogue management", is
written by David R. Traum and Staffan Larsson. In this paper the
authors introduce the information state approach to dialogue
management, and show how it can be used to formalize theories of
dialogue in a manner suitable for easy implementation. The authors
start by defining what dialogue management is. Then they propose two
contributions towards solving the problem of dialogue management re-
use: first, unifying a view of dialogue management that can help
organize the relationship between dialogue theories and
implementations. The unifying view includes a proposal to formalize
dialogue management functions in terms of information state update.
Second, software tools that can help to achieve reusable dialogue
systems. They continue by presenting the information state approach.
After that they show how the information state approach can be used to
help provide reusable components for dialogue system design, separating
three layers: basic software engineering layer, dialogue theory layer,
task/domain specific layer. Then they describe TrindiKit, a tool that
provides the basic software engineering glue that can be used to
implement a dialogue manager at a level closer to linguistic theories
than other existing toolkits. Next they illustrate some of the systems
that have been built using TrindiKit. Finallly they describe how the
separation of architecture layers previously defined has led to actual
reuse in a number of dialogue systems.

Chapter 16, "Visualizing spoken discourse, is written by Li-Chung Yang.
The goal of the study is to look at the distribution of interruptive
occurrences in natural speech, and investigate their respective
functions and characteristics. It is shown that interruptions are
important elements in the interactive character of discourse and in the
resolution of issues of cognitive uncertainty and planning. The author
analyses what are the different types of interruptions and to what
extent are prosodic-acoustic features significant in distinguishing
between the different types of interruptions. He distinguishes between
cooperative and competitive interruptions. The specific pitch height of
the interruption varies with the expression of emotion, signals of
attention-getting, and signals of competitiveness. In general,
competitive interruptions are marked by a high pitch level and a loud
amplitude, expressing the participant's competition for the focus of
attention. By contrast, cooperative interruptions are more supportive
of the main speaker's floor rights. Because of their non-disruptive
nature, they often occur at low or medium pitch levels and they are
generally lower in pitch than competitive interruptions. Their
amplitude can vary. To conclude the author analyses the implications
for dialogue systems.

Roser Morante is a PhD student in the Section of Computational
Linguistics and AI of Tilburg University. Her areas of research are
computational pragmatics and dialogue systems. In her current project
her goal is to define mechanisms for computing information states and
updates of information states in a dialogue system.

Amazon Store: