LINGUIST List 15.1169

Sat Apr 10 2004

Review: Computational Ling: van Kuppevelt & Smith(2004)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Sheila Dooley Collberg at collberglinguistlist.org.

Directory

  1. Roser Morante, Current and New Directions in Discourse and Dialogue

Message 1: Current and New Directions in Discourse and Dialogue

Date: Sat, 10 Apr 2004 00:43:07 -0400 (EDT)
From: Roser Morante <R.moranteuvt.nl>
Subject: Current and New Directions in Discourse and Dialogue

EDITOR: van Kuppevelt, Jan; Smith, Ronnie W.
TITLE: Current and New Directions in Discourse and Dialogue
SERIES: Text, Speech and Language Technology 22
PUBLISHER: Kluwer Academic Publishers
YEAR: 2004

ANNOUNCED IN: http://linguistlist.org/issues/15/15-138.html


Roser Morante, Computational Linguistics and AI section, 
Faculty of Arts, University of Tilburg. 

''Current and New Directions in Discourse and Dialogue'' is a
collection of sixteen papers. Twelve of them are extended versions of
the papers presented in the Second SIGdial Workshop on Discourse and
Dialogue held in September 2001 in Aalborg, Denmark. The rest are
invited papers. As the editors point out, the three main themes
addressed in the book are: (i) corpus annotation and analysis:
chapters 1, 3, 5; (ii) methodologies for construction of dialogue
systems: chapters 2, 6, 10, 12, 15; and (iii) perspectives on various
theoretical issues: chapters 4 (communicative intention), 7 (human-
computer versus human-human dialogues), 8 (context-based generation),
9 and 11 (clarification requests), 13 (conversational implicatures),
14 (modeling of discourse structure), 16 (role of interruptions).

The fact that the book gathers papers on several research areas
related to discourse and dialogue makes it interesting not only for
researchers working on specific areas, but also for those who would
like to have a multidimensional view of the current dialogue and
discourse oriented research by reading some representative
articles. However, the reader should not expect to find machine
learning studies in dialogue modelling.

Chapter 1, ''Annotations and tools for an activity based spoken
language corpus'', written by Jens Allwood, Leif Gr�nqvist,
Elisabeth Ahls�n, an� Magnus Gumnarson, describes the Spoken
Language Corpus of Swedish developed in the Department of Linguistics
at G�teborg Universtiy (GSLC) and the various types of tools and
analysis that have been developed for work on this corpus. The corpus
contains 1.3 million words of naturalistic spoken language data, its
original feature being the number of social activities recorded
(25). The tools described are the following: TransTool, a tool for
transcribing spoken language following a transcription standard (GTS
6.2, MSO6); Corpus Browser, a web interface that makes it possible to
search for words, word combinations and phrases; Tractor, a coding
tool to create new coding schemas and annotate transcriptions; a
toolbox that allows to visualize coding schemas and coding directly in
the transcription as a FrameMaker document; TraSA, a tool that
calculates 30 statistical measurements; SyncTool, a tool that
synchronizes transcriptions with digitized audio/video recordings; and
MultiTool, which is a general tool for linguistic annotation and
transcribing of dialogs, as well as browsing, searching and
counting. The tools allow synchronizing views pertaining to the same
point in time in order to show the same sequence from different points
of view. The corpus has been analyzed quantitatively and
quantitatively. As for the quantitative analysis, a set of
automatically derivable properties of the corpus (volume, ratios,
special descriptors, lemma, POS, collocations, frequency lists,
sequences of part of speech and similarities) has been defined using
the information provided by the transcriptions. The qualitative
analysis of the corpus has resulted in the development of coding
schemas, which include: social activity and communicative act related
coding, communication management related coding, grammatical coding
and semantic coding.

Chapter 2, ''Using direct variant transduction for rapid development
of natural spoken interfaces'', is written by Hiyan Alshawi and Shona
Douglas. The authors present a new approach to constructing
interactive spoken interfaces, the Direct Variant Transduction (DVT),
that relies on specifying an application with examples and on
classification and pattern-matching techniques. The method aims at
addressing two bottlenecks in the development of an interface with
natural spoken language: coping with language variation and linking
natural language to appropriate actions in the application
back-end. The authors first outline the characteristics of the method,
which adopts several constraints: (i) applications are constructed
using a relatively small number of example inputs from which robust
interpretation models are compiled automatically; (ii) no intermediate
semantic representations are needed; (iii) confirmation queries posed
by the system to the user are constructed automatically from the
examples; (iv) dialog control should be simple to specify for simple
applications, while allowing the flexibility of delegating this
control to another module for more complex applications.

Two applications have been built using this method: one to access e-
mail and a call-routing application. The authors continue by
describing what needs to be provided by the application builder. An
application consists of a set of contexts. Each context provides the
mapping between user inputs and application actions that are
meaningful in a particular stage of interaction between the user and
the system. Next they explain variant expansion and the use of
classifiers and matchers. Their approach to handle language variation
is twofold: first, they use robust recognition, classification and
matching techniques, instead of rules. Second, they expand contexts to
include variants not included in the original set of examples provided
by the application developer. After that they describe their approach
to specifying dialogue control. Finally, the authors present
quantitative data that, as they point out, indicate that a very small
number of training examples can provide useful performance in a call
routing application. The results suggest that the DVT method is a
viable option for constructing spoken language applications without
specialized expertise.

Chapter 3, ''An interface for annotating natural interactivity'', is
written by Niels Ole Bernsen, Laila Dybkjaer, and Mykola Kolodnytsky.
The article focuses on spoken dialogue data annotation. The goal of
their research is to build a user-friendly general-purpose tool for
coding natural interactive communicative behavior in the framework of
the NITE project. The authors start by pointing out the need to have a
tool, which enables cross-level and cross-modality coding for all the
levels of analysis involved in natural interactivity, since this kind
of tool is a key-factor in the development of high quality annotated
data and coding schemes. They continue by presenting a review of
existing natural interactivity coding tools and by introducing the
NITE project coding tool specification. Next they describe the NITE
WorkBench (NWB) annotation user interface. It aims to support
annotation of any kind of phenomena involved in natural interactive
communication. The main requirements of the tool are: supporting
working with annotation projects, including meta-data; enabling
flexible control of raw data files (audio and/or video); supporting
annotation of natural interactive communication at any analytical
level and across levels and modalities through the use of existing or
new coding schemes; enabling users to specify new coding schemes; and
enabling information extraction and analysis of annotated data.
Finally, they present the Audio-Visual Annotation Interface: the data
structure, where project files are used as organizers; the annotation
interface, and the coding scheme interface.

Chapter 4, ''Managing communicative intentions with collaborative
problem solving'', is written by Nate Blaylock, James Allen, and
George Ferguson. The work is done in the context of producing a
conversational agent. In this paper the authors concern themselves
with the intention/language interface level, where communicative
intentions are converted to and from a high-level semantic form. The
paper proposes a descriptive model of dialogue, based on collaborative
problem solving, which defines communicative intentions as attempts to
modify a shared collaborative problem-solving state between the user
and the system. Following the authors, modeling communicative
intentions based on collaborative problem solving allows the
behavioral reasoning component of a dialogue system to only worry
about problem solving and not about linguistic issues. The model
covers a range of collaboration paradigms and models dialogue
involving planning and acting. The authors start by describing
previous work (models of collaborative planning and models of
dialogue). Next they introduce their model of dialogue, showing how
communicative intentions are defined. In the model they try to
enumerate the complete taxonomy of collaborative problem-solving
activities. They continue by exemplifying the model with several
dialogue examples and by discussing how the model is used in the TRIPS
dialogue system. Finally, they put forward some conclusions and future
work.

Chapter 5, ''Building a discourse-tagged corpus in the framework of
rhetorical structure theory'', is written by Lynn Carlson, Daniel
Marcu, and Mary Ellen Okurowski. The authors describe their experience
in developing the Rhetorical Structure Theory (RST) corpus. Their goal
was to conduct large-scale implementation within the framework of a
single discourse theory. The corpus contains 385 documents of American
English selected from the Penn Treebank, which are hierarchically
annotated in the framework of Rhetorical Structure Theory. The corpus
can be mined in order to study discourse related phenomena. The
discourse structure is built by means of 78 rhetorical relations and
three additional relations to impose structure on the tree. The paper
describes the selection of theoretical approach, annotation
methodology, training, and quality of assurance. A main feature of the
corpus is that it is annotated at different levels: leaf-level,
text-level, and mid-level analysis, so that the mining of the corpus
can be done at different levels. The authors present three examples:
comparison of discourse structure at the leaf-level, comparisons of
trees for different styles of news reports at the text-level, and
examination of relations at the mid-level analysis.

Chapter 6, ''An empirical study of speech recognition errors in human
computer dialogue'', is written by Marc Caravazza. The author
investigates the impact of speech recognition errors on a fully
implemented dialogue prototype based on a speech acts formalism. The
system is a mixed-initiative conversational interface organized around
a human like character, with which the user communicates through
speech recognition. The software architecture is a pipeline comprising
speech recognition, parsing and dialogue. The main goal of parsing is
to produce a semantic structure from which speech acts can be
identified. The system operates in a bottom-up fashion, it does not
include any specific mechanism for error detection or explicit error
handling. The author reports findings on the consequences of speech
recognition errors on the identification of speech acts, and the
conditions under which the system can be robust to those errors. He
provides an empirical classification of system reaction to speech
recognition errors, and discusses methods for the specific evaluation
of the consequences of speech recognition errors. Caravazza proposes
that the method should include a combination of speech acts accuracy
metrics and concept efficiency metrics. He concludes by saying that
there appears to be no simple correlation between robustness to speech
recognition errors and the depth of parsing and interpretation, since
there are a number of factors that support the robustness of the
system to speech recognition errors, like for example the fact that
the dialogue control mechanism triggered by the speech act recognition
can contribute to repairing the consequences of speech recognition
errors.

Chapter 7, ''Comparing several aspects of human-computer and
human-human dialogues'', is written by Christine Doran, John Aberdeen,
Laurie Damianos, and Lynette Hirschman. The authors present results of
the experiments carried out to compare human-human (HH) and
human-computer (HC) interaction in the context of the Communicator
Travel task, a DARPA-funded program. In order to begin an empirical
exploration of how certain aspects of the dialogue shed light on
differences between HC and HH communication, they annotated dialogues
(20 HH and 40 HC) from the air travel domain with several sets of
tags: dialogue act (CSTAR consortium tags), initiative (which
participant has control at the end of the turn), and unsolicited
information (this only for HC dialogues). The paper describes the
data, the coding of the corpus and the analysis. The authors point out
some findings. In general the conversation is more balanced between
traveler and expert in the HH setting, in terms of amount of speech,
types of dialogue acts and sharing initiative. As for initiative, in
the HC data the experts massively dominated in taking the initiative,
whereas in the HH data, users and expert shared the initiative
relatively equitably. The results of the experiments show that one of
the most salient characteristics of the HC data is that they contain
many misunderstandings of a sort that are nearly absent in the HH
data. They find that misunderstandings prevalent in the HC data can be
classified into three groups (hallucinations, mismatches and prompt
after fills), and that these misunderstandings can be detected through
the combination of a semantic annotation and an automatic algorithm.

Chapter 8, ''Full paraphrase generation for fragments in dialogue'',
is written by Christian Ebert, Shalom Lappin, Howard Gregory, and
Nicolas Nicolov. The authors start by pointing out that one major
challenge for any dialogue interpretation system is the proper
treatment of fragments. In this paper they show how to generate
phrases for fragments of dialogues with SHARDS, which is a system for
the resolution of fragments in a dialogue, based on a version of HPSG
which integrates the situation semantics-based theory of dialogue
context given in KOS. The generator uses a template-filler approach
and it does not do any deep generation from an underlying semantic
representation. Instead it reuses the results of the parse and
interpretation process of SHARDS to dynamically compute the templates,
and then to update the filter. The innovation of their approach is
that they situate generation within the context of dialogue
interpretation, specifically fragment resolution. In doing so they are
able to eliminate much of the indeterminacy that characterizes
classical generation systems by exploiting the rich syntactic and
phonological information produced in the course of dialogue
interpretation. The paper starts with the presentation of the system
and the grammatical background. Next the proposal for generating
fragment paraphrases with templates is explained, and, finally, the
implementation of SHARDS and the generation component are described.

Chapter 9, ''Disentangling public from non-public meaning'', is
written by Jonathan Ginzburg. The author starts by claiming that
analyses of interaction need to characterize not solely 'success
conditions', but also 'clarification potential'. In this paper the
author illustrates how characterising certain classes of Clarification
Requests (CR) can shed light on the problem of distinguishing publicly
expressed communicative effects from non-public ones. First he
considers the very productive and effective ways of producing CRs
relating to the grammatically governed content of an utterance. Then
he turns to CRs that pertain to the non-public intentions of a
conversational participant, like Whymeta. He demonstrates that Whymeta
shows distinct behaviour from CRs that pertain to grammatically
governed content, the most prominent feature being that whereas the
latter are almost invariably adjacent to the utterances whose
clarification they seek, non-adjacency is quite natural for
Whymeta. This leads him to establish the distinction between the
notion of utterer's content and utterer's plan. The author provides
data to reinforce the distinction between Utterer's Content and
Utterer's Plan. He provides some background notions from the KOS
framework required for his formalization and applies this to explicate
Utterer's Content. Finally he considers a previous analysis of Whymeta
and develops his own analysis, which involves viewing it as an
instance of a metadiscoursive utterance, instead of as a mechanism
that clarifies a contextually instantiable goals/plan parameter.

Chapter 10, ''Adaptivity and response generation in a spoken dialogue
system'', is written by Kristiina Jokinen and Graham Wilcock. The
paper addresses the issue of how to increase adaptivity in response
generation for a spoken dialogue system. Realization strategies for
dialogue responses depend on communicative confidence levels and
interaction management goals. They first discuss interaction models
and naturalness and give concrete examples from a spoken dialogue
system in which different forms of surface realization are required in
order to achieve interaction management goals. They continue by
describing a Java/XML-based generator which produces different
realizations of system responses based on agendas specified by the
dialogue manager. The way in which the generator chooses between the
different realizations is based on detailed specifications of the
information status of different concepts, given in an agenda by the
dialogue manager component. They then discuss how greater adaptivity
can be achieved by using a set of distinct generator agents, each of
which is specialized in its realization strategy. This allows a
simpler design of each generator agent while increasing the overall
system adaptivity to meet the requirements for flexible cooperation in
incremental and immediate interactive situations.

Chapter 11, ''On the means of clarification in dialogue'', is written
by Matthew Purver, Jonathan Ginzburg, and Patrick Healey. In this
paper they describe an attempt to exhaustively categorise
Clarification Requests (CR) forms and readings based on corpus
work. CR can take various forms and can be used to request various
types of clarification information, but have in common the fact that
they are in some sense utterance-anaphoric. Thus the corpus work has
the additional aim of identification of the maximum distance between a
CR and the utterance being clarified. The authors start by discussing
previous work on CR. Then they list the possible CR (non-reprise
clarifications, reprise sentences, reprise sluices, reprise fragments,
gaps, gap fillers, and conventional) and readings (clausal,
constituent, lexical, and corrections) that they identify from corpus
analysis. They continue by describing the analysis of the
corpus. Finally, they discuss the implications of their results for a
possible HPSG analysis of clarification requests and for an ongoing
implementation of a clarification-capable dialogue system.

Chapter 12, ''Plug and play spoken dialogue processing'', is written
by Manny Rayner, Johan Boye, Ian Lewin, and Genevieve Gorrell. The
paper contains a description of a spoken language dialogue system
architecture which supports plug and playable networks of objects. The
discussion centres around a concrete prototype system, CANTONA. The
main point of plug and play spoken language dialogue is that at any
given time the system's dialogue capabilities are determined by the
set of devices currently connected; adding new devices dynamically
changes its ability to recognise, understand, and respond to
commands. The authors first introduce the top- level components and
the key interfaces of the CANTONA Plug and Play demonstrator, where
all processing is rule-based. Next they describe the general
architecture considerations to achieving plug and play functionality
in rule-based systems, the rules and hierarchies, the plug and play
response generation and the speech recognition and parsing.

Chapter 13, ''Conversational implicatures and communication theory, is
written by Robert van Rooy. This paper presents an account for
implicatures in terms of a mathematical theory of communicaiton.
Following the author, from a standard pragmatics perspective
conversational implicatures should be accounted for in terms of
Grice's maxims of conversation. Neo-Giceans seek to reduce those
maxims to the so-called Q and I-principles. In this paper the author
argues that: (i) there are major problems for reducing Gricean
pragmatics to these two principles, and (ii) that, in fact, it is
better to account for implicatures in terms of the principles of (a)
optimal relevance and (b) optimal coding. To formulate these
principles the author makes use of Shannon's mathematical theory of
communication.

Chapter 14, ''Reconciling control and discourse structure'', is
written by Susan E. Strayer, Peter A. Heeman, and Fan Yang. In this
paper the authors consider how control, in the sense of initiative, is
managed in task-oriented dialogues. They first describe previous work
in discourse structure and in control, and they present their coding
of the corpus (eight dialogues of the TRAINS corpus) with subdialogues
and control tags based on the DAMSL coding schema. Next they explore
the relationship between discourse structure and control: (i) they
compare control boundaries to subdialogue boundaries using recall and
precision; (ii) they look at control inside of discourse
segments. Then they explore how control can shift within a subdialogue
and find two types of contributions that a speaker can make in a
discourse segment: collaborative completions, in which the
non-initiator helps the segment initiator achieve their goal, and
short contributions to the discourse segment purpose. They find that
collaborative completions and co- contributions are exceptions to the
general rule that control tends to reside with the same speaker. Based
on the results of their study they propose that control is subordinate
to the intentional structure. Control is held by the segment
initiator. They point out the implications for dialogue management: a
system only needs to model intentional structure, from which control
follows.

Chapter 15, ''The information state approach to dialogue management'',
is written by David R. Traum and Staffan Larsson. In this paper the
authors introduce the information state approach to dialogue
management, and show how it can be used to formalize theories of
dialogue in a manner suitable for easy implementation. The authors
start by defining what dialogue management is. Then they propose two
contributions towards solving the problem of dialogue management re-
use: first, unifying a view of dialogue management that can help
organize the relationship between dialogue theories and
implementations. The unifying view includes a proposal to formalize
dialogue management functions in terms of information state update.
Second, software tools that can help to achieve reusable dialogue
systems. They continue by presenting the information state approach.
After that they show how the information state approach can be used to
help provide reusable components for dialogue system design,
separating three layers: basic software engineering layer, dialogue
theory layer, task/domain specific layer. Then they describe
TrindiKit, a tool that provides the basic software engineering glue
that can be used to implement a dialogue manager at a level closer to
linguistic theories than other existing toolkits. Next they illustrate
some of the systems that have been built using TrindiKit. Finallly
they describe how the separation of architecture layers previously
defined has led to actual reuse in a number of dialogue systems.

Chapter 16, ''Visualizing spoken discourse, is written by Li-Chung
Yang. The goal of the study is to look at the distribution of
interruptive occurrences in natural speech, and investigate their
respective functions and characteristics. It is shown that
interruptions are important elements in the interactive character of
discourse and in the resolution of issues of cognitive uncertainty and
planning. The author analyses what are the different types of
interruptions and to what extent are prosodic-acoustic features
significant in distinguishing between the different types of
interruptions. He distinguishes between cooperative and competitive
interruptions. The specific pitch height of the interruption varies
with the expression of emotion, signals of attention-getting, and
signals of competitiveness. In general, competitive interruptions are
marked by a high pitch level and a loud amplitude, expressing the
participant's competition for the focus of attention. By contrast,
cooperative interruptions are more supportive of the main speaker's
floor rights. Because of their non-disruptive nature, they often occur
at low or medium pitch levels and they are generally lower in pitch
than competitive interruptions. Their amplitude can vary. To conclude
the author analyses the implications for dialogue systems.

ABOUT THE REVIEWER

Roser Morante is a PhD student in the Section of Computational
Linguistics and AI of Tilburg University. Her areas of research are
computational pragmatics and dialogue systems. In her current project
her goal is to define mechanisms for computing information states and
updates of information states in a dialogue system.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue