LINGUIST List 14.2512

Mon Sep 22 2003

Review: Computational Ling: Mitkov (2002)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at siminlinguistlist.org.

Directory

  1. Peter K�hnlein, Anaphora Resolution

Message 1: Anaphora Resolution

Date: Mon, 22 Sep 2003 14:01:52 +0000
From: Peter K�hnlein <puni-bielefeld.de>
Subject: Anaphora Resolution

Mitkov, Ruslan (2002) Anaphora Resolution, Longman, Studies in
Language and Linguistics.

Announced at http://linguistlist.org/issues/14/14-1388.html


Peter K�hnlein, SFB 360, Univ. Bielefeld

The book is suitable for everybody interested in the topic of anaphora
resolution, ''including ... researchers, lecturers, students and NLP
software developers'', as the preface announces correctly. The book is
divided up into an introduction and nine chapters, which will be
treated in turn below. A detailed index at the end of the book
contains all the central keywords.

The layout of the book is as follows: after an introduction into the
linguistic fundamentals (Ch.1) concerning anaphora, some difficulties
for automatic (i.e., computer based) resolution are highlighted in
Ch.2. Ch.3 reviews briefly a number of theories and formalisms that
are related to anaphora resolution. A historical overview (Ch.4),
spanning the 60s, 70s and 80s, is followed by a discussion of what the
author calls the main trends in recent anaphora resolution (Ch.5). The
role of corpora in that area is treated in Ch.6, while Ch.7 is devoted
to the explication of the authors own approach, characterized as a
robust, knowledge-poor algorithm. A chapter on evaluation (Ch.8) and
one on outstanding issues (Ch.9) close the book. Each chapter is
closed by a summary and endnotes. The chapters are to some degree
self-contained, so that it is sufficient to read only specific parts
of the book because the details needed are repeated. Chs.1-4 are
introductory and suitable for students without previous
knowledge. Chs.5-8 contain a discussion of recent work on the
subject. Ch.9 gives an impression of future research that will have to
be done. From Ch.3 onward, the focus is put on the resolution of
nominal anaphora.

Ch.1, which is devoted to the introduction of linguistic fundamentals,
is a good primer for students who have to start from
scratch. Cohesion, co-reference, and the notion of a discourse entity
are related to various forms of anaphora. Mitkov frequently quotes
examples from different languages to sustain his claims. Intra- and
extra-sentential anaphora are distinguished, followed by a short
discussion of indirect anaphora. A distinction is drawn between
identity-of-sense and identity-of-reference anaphora. Types of
antecedents are introduced and the effects of their different
locations discussed. Anaphora are related to other linguistic
phenomena (cataphora, deixis, ambiguity). The question of when
anaphora are resolved in human language processors is touched upon.

Ch.2 contains a discussion of different sources of knowledge that have
to be drawn upon by automatic anaphora resolution and relates them to
the linguistic basics that were introduced in the first chapter. The
knowledge sources mentioned are: morphology, lexicon, syntax,
semantics, discourse and common-sense knowledge. Tools and resources
that are needed to implement the introduced anaphora resolution
factors are listed

Ch.3 introduces some of the theories that have been used in anaphora
resolution, mainly Centering Theory, Binding Theory, the work on focus
done by Grosz and Sidner in the 70s, and Discourse Representation
Theory. Here, the discussion of Centering Theory and Binding Theory
gives a good overview of some of the recent developments. The
discussion of ''other related work'' introduces only the fundamentals
of the respective theories, e.g., Kamp & Reyle's (1993) basic
framework.

Ch.4 is intended as historical excursion. This comes a little bit as a
surprise, as the author in his preface pointed out that he would not
cover work prior to 1986 in detail, but refers to Hirst's (1981) book
''Anaphora in Natural Language Understanding'' and Carter's
''Interpreting Anaphora in Natural Language'' (1987). Now, this
chapter revisits earlier work at least to some detail. However, being
a concise description of the past developments, the chapter is for
sure of use for an impatient student. The chapter covers STUDENT,
SHRDLU, LUNAR, Hobb's algorithm, BFP, SPAR, as well as distributed
architectures as suggested by Rich & LuperFoy and Carbonell & Brown. A
section on other work briefly summarizes alternative solutions. In
total, the work done in the early period of automatic anaphora
resolution is characterized as dominated by knowledge-rich, i.e.,
costly, strategies.

Chapters 1 - 4 are obviously intended as introductions to the
respective topics. The following chapters 5 - 8 picture the state of
the art in anaphora resolution.

Ch.5, in contrast to Ch.4, deals with present-day research that is
considered as oriented toward knowledge-poor and corpus-based
work. The first section identifies the main trends in present
research, the following sections elaborate on that work. Here, the
book follows a mixed strategy of presenting the strategies partly
according to themes (''Collocation patterns-based approach'', etc.),
partly according to researchers (Lappin and Leass, etc.). The relevant
algorithms are explained, and the evaluations, wherever possible,
discussed.

Ch.6 motivates the use of corpora in anaphora resolution and surveys
recent corpora that are appropriately annotated. The survey is
followed by an overview of annotation schemes that are in use (UCREL,
MUC, DRAMA, Bruneseaux & Romary, Poesio & Vieira, MATE, Tutin et.al.,
Rocha, Botley). The use of each in tagging texts is exemplified. The
author adds a comparison of tools that are available or prospective
for the task of actually tagging texts. They comprise XANADU, DTTool,
Alembic Workbench, Referee, CLinkA, FAST, the tools to be implemented
in the ATLAS group, and a set of tools that has been suggested by Day
et.al. He discusses the necessity of settling on an adequate
annotation strategy and gives some examples of resulting coding
guidelines. The chapter ends in a discussion of the topic of
inter-annotator agreement and respective measures.

In Ch.7 the author presents his own algorithm which he describes as
robust and knowledge-poor. The domain for which the algorithm is
developed is that of manuals for hard- or software. The presentation
is made in two broad steps.
 
First, the ''original'' algorithm is introduced and discussed to some
length. The pre-processing of the data and the strategy for anaphora
resolution are presented, a description of the algorithm, an example
and evaluation are given. As indicators for candidates for
antecedents, (i) a class of Indicating Verbs is defined; (ii) lexical
reiteration counts as an indicator; (iii) NPs in section headings are
given a bonus; (iv) Collocation patterns are matched; (v) NPs in
coordinate constructions are assigned higher plausibility; (vi) in
certain (''sequential'') constructions, primacy counts as an
indicator; (vii) for the domain of manuals, indefiniteness is counted
against a candidate, as well as (viii) the status of being a
prepositional noun phrase.
 
The algorithm always identifies a single antecedent as the most
plausible candidate, which accounts for its robustness. The paper
describes modifications for the treatment of anaphora in multiple
languages and corresponding evaluations. Mitkov stresses the point
that a bilingual implementation of his algorithm is superior to a
monolingual and surveys the work done here.
 
In the second step, a modified, fully automated version of the
resolver, called MARS, is introduced. MARS is a fully automatic
implementation of an improved version of the original resolver, where
''fully automatic'' means that there is no human intervention at any
stage of the resolution of the anaphora. MARS uses a Functional
Dependency Grammar parser as pre-processing tool. As for indicators,
three more are used than in the original implementation: (ix) pronouns
are allowed as possible antecedents and given a bonus; (x) syntactic
parallelism is awarded a boosting score; (xi) frequent candidates are
preferred. The paper describes the algorithm which uses these
indicators, and a genetic optimization algorithm is introduced that
leads to an improvement in performance. The resolver is then evaluated
according to different criteria, e.g., with and without optimization
by the genetic algorithm. As with the previous implementation, a
version for non-English anaphora resolution is described, this time
for Bulgarian, as well as the evaluation.

Ch.8 contains a discussion of evaluation in anaphora resolution. A
distinction is drawn between the evaluation of the resolution
algorithm and of the system as a whole. A number of measures is
introduced and the applicability to algorithm and system . The
measures are introduced in contrast to earlier proposals by Aone and
Bennett (1995) and Baldwin (1997). For evaluation of the algorithm,
''success rate'', ''critical success rate'' and ''non-trivial success
rate'' are distinguished. They are proposed for measuring the
performance of the algorithm. Once this is achieved, comparative
evaluations are envisaged (and indeed a couple of comparisons
made). Finally, the possibility to establish the ''decision power'' or
''relative importance'' of indiviual components of the algorithm is
discussed. The measures ''(non-trivial/critical) success rate'' then
are applied to the resolution system as a whole. Additionally,
''resolution etiquette'' is proposed as an indicator for the
efficiency of determining non-nominal anaphora. The topic of
reliability of an evaluation in the context of anaphora resolution is
discussed. The author proposes an evaluation workbench (which already
is implemented by Catalina Barbu) as a tool for ''fair''
evaluation. At the end of this chapter, other work on evaluation is
surveyed.

The previous chapters 1 - 8 are presentations of previously and recent
work. The last chapter 9 concentrates on outstanding issues.

Ch.9 briefly summarizes central topics of the book. This is followed
by a detailed discussion of three issues the author views as central
for the future development of the field: research in the factors that
are used by resolution algorithms, improvement of pre-processing, and
the need for annotated corpora. A number of other outstanding issues
are raised in the last section of the book. The author hints at the
freely available material that can be obtained from his projects URL
(which has meanwhile changed to http://clg.wlv.ac.uk ).

EVALUATION

The chapters are to some extent self-contained, i.e., most of the
knowledge one needs to comprehend each chapter is introduced right
there. This has both the advantage that each chapter (and, in some
cases - e.g., the surveys of implementations in Ch.4 - each section
within a chapter) can be read in isolation and the disadvantage of
being redundant to the same degree.

As to the layout of the book, it is somewhat strange that Ch.8 on
evaluation does not precede Ch.7, which introduces Mitkov's own
approach and puts much emphasis on evaluation. Given the degree of
self-containedness of the chapters, there is no actual loss of
readability.

With regard to future work in the area of automatic anaphora
resolution, it could be added that more diverse domains would be
desirable than those which are currently treated. I would like to
take an extreme example: There are difficulties in, e.g., spoken
language that do never occur in written text like manuals. Here, it
would be interesting to see which strategy had to be pursued in order
to come to grips with multi-speaker sequences such as the following
taken from our own corpus. (aX) marks utterance X from speaker a, (bY)
utterance Y from speaker b.

(a1) Well, now you take
(b1) a bolt
(a2) an orange one with a slit
(b2) yes
(a3) and you put it through there
(b3) from above
(a4) from above
 so that the three get fixed then
(b4) yes

One of the problems with this short stretch of discourse is that the
utterances in the dialog do not consist of complete sentences or
illocutionary acts. So many of the indicators that have been used in
the accounts that are detailed in the book can not be applied for this
case. But note that the pronoun ''it'' in a3 has either ''a bolt''
(b1) or ''an orange one...'' (a2) as an antecedent. Another
interesting feature of the sample discourse is the connection between
a3 and b3, which is clearly anaphoric in that b wants to know in which
direction the bolt has to be put through some hole in a bar. (This,
of course, is not a case of NP anaphora.) At the moment, it seems
clear that resolving anaphora in spoken language would be too big a
task.

The desire for more diversity by no means diminishes the value of the
book under review. To modify a quote from the book that is intended to
exemplify a case of anaphora: ''|The book| is not merely a survey of
anaphora resolution: |it| also presents the latest research by the
author.'' I would add: ''Every library should have a copy of |it|.''

ABOUT THE REVIEWER

Peter K�hnlein is a PhD student at the University of Bielefeld. His
main interests are theories of reference and multi-modal dialog as
well as philosophy of language and philosophy of science. He is
research assistant at the collaborative research center SFB 360.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue