Date: Mon, 22 Sep 2003 17:26:32 +0200 From: Peter Kühnlein <p@uni-bielefeld.de> Subject: Anaphora Resolution
Mitkov, Ruslan (2002) Anaphora Resolution, Longman, Studies in Language and Linguistics.
Peter Kühnlein, SFB 360, Univ. Bielefeld
The book is suitable for everybody interested in the topic of anaphora resolution, "including ... researchers, lecturers, students and NLP software developers", as the preface announces correctly. The book is divided up into an introduction and nine chapters, which will be treated in turn below. A detailed index at the end of the book contains all the central keywords.
The layout of the book is as follows: after an introduction into the linguistic fundamentals (Ch.1) concerning anaphora, some difficulties for automatic (i.e., computer based) resolution are highlighted in Ch.2. Ch.3 reviews briefly a number of theories and formalisms that are related to anaphora resolution. A historical overview (Ch.4), spanning the 60s, 70s and 80s, is followed by a discussion of what the author calls the main trends in recent anaphora resolution (Ch.5). The role of corpora in that area is treated in Ch.6, while Ch.7 is devoted to the explication of the authors own approach, characterized as a robust, knowledge-poor algorithm. A chapter on evaluation (Ch.8) and one on outstanding issues (Ch.9) close the book. Each chapter is closed by a summary and endnotes. The chapters are to some degree self-contained, so that it is sufficient to read only specific parts of the book because the details needed are repeated. Chs.1-4 are introductory and suitable for students without previous knowledge. Chs.5-8 contain a discussion of recent work on the subject. Ch.9 gives an impression of future research that will have to be done. From Ch.3 onward, the focus is put on the resolution of nominal anaphora.
Ch.1, which is devoted to the introduction of linguistic fundamentals, is a good primer for students who have to start from scratch. Cohesion, co-reference, and the notion of a discourse entity are related to various forms of anaphora. Mitkov frequently quotes examples from different languages to sustain his claims. Intra- and extra-sentential anaphora are distinguished, followed by a short discussion of indirect anaphora. A distinction is drawn between identity-of-sense and identity-of-reference anaphora. Types of antecedents are introduced and the effects of their different locations discussed. Anaphora are related to other linguistic phenomena (cataphora, deixis, ambiguity). The question of when anaphora are resolved in human language processors is touched upon.
Ch.2 contains a discussion of different sources of knowledge that have to be drawn upon by automatic anaphora resolution and relates them to the linguistic basics that were introduced in the first chapter. The knowledge sources mentioned are: morphology, lexicon, syntax, semantics, discourse and common-sense knowledge. Tools and resources that are needed to implement the introduced anaphora resolution factors are listed
Ch.3 introduces some of the theories that have been used in anaphora resolution, mainly Centering Theory, Binding Theory, the work on focus done by Grosz and Sidner in the 70s, and Discourse Representation Theory. Here, the discussion of Centering Theory and Binding Theory gives a good overview of some of the recent developments. The discussion of "other related work" introduces only the fundamentals of the respective theories, e.g., Kamp & Reyle's (1993) basic framework.
Ch.4 is intended as historical excursion. This comes a little bit as a surprise, as the author in his preface pointed out that he would not cover work prior to 1986 in detail, but refers to Hirst's (1981) book "Anaphora in Natural Language Understanding" and Carter's "Interpreting Anaphora in Natural Language" (1987). Now, this chapter revisits earlier work at least to some detail. However, being a concise description of the past developments, the chapter is for sure of use for an impatient student. The chapter covers STUDENT, SHRDLU, LUNAR, Hobb's algorithm, BFP, SPAR, as well as distributed architectures as suggested by Rich & LuperFoy and Carbonell & Brown. A section on other work briefly summarizes alternative solutions. In total, the work done in the early period of automatic anaphora resolution is characterized as dominated by knowledge-rich, i.e., costly, strategies.
Chapters 1 - 4 are obviously intended as introductions to the respective topics. The following chapters 5 - 8 picture the state of the art in anaphora resolution.
Ch.5, in contrast to Ch.4, deals with present-day research that is considered as oriented toward knowledge-poor and corpus-based work. The first section identifies the main trends in present research, the following sections elaborate on that work. Here, the book follows a mixed strategy of presenting the strategies partly according to themes ("Collocation patterns-based approach", etc.), partly according to researchers (Lappin and Leass, etc.). The relevant algorithms are explained, and the evaluations, wherever possible, discussed.
Ch.6 motivates the use of corpora in anaphora resolution and surveys recent corpora that are appropriately annotated. The survey is followed by an overview of annotation schemes that are in use (UCREL, MUC, DRAMA, Bruneseaux & Romary, Poesio & Vieira, MATE, Tutin et.al., Rocha, Botley). The use of each in tagging texts is exemplified. The author adds a comparison of tools that are available or prospective for the task of actually tagging texts. They comprise XANADU, DTTool, Alembic Workbench, Referee, CLinkA, FAST, the tools to be implemented in the ATLAS group, and a set of tools that has been suggested by Day et.al. He discusses the necessity of settling on an adequate annotation strategy and gives some examples of resulting coding guidelines. The chapter ends in a discussion of the topic of inter-annotator agreement and respective measures.
In Ch.7 the author presents his own algorithm which he describes as robust and knowledge-poor. The domain for which the algorithm is developed is that of manuals for hard- or software. The presentation is made in two broad steps. First, the "original" algorithm is introduced and discussed to some length. The pre-processing of the data and the strategy for anaphora resolution are presented, a description of the algorithm, an example and evaluation are given. As indicators for candidates for antecedents, (i) a class of Indicating Verbs is defined; (ii) lexical reiteration counts as an indicator; (iii) NPs in section headings are given a bonus; (iv) Collocation patterns are matched; (v) NPs in coordinate constructions are assigned higher plausibility; (vi) in certain ("sequential") constructions, primacy counts as an indicator; (vii) for the domain of manuals, indefiniteness is counted against a candidate, as well as (viii) the status of being a prepositional noun phrase. The algorithm always identifies a single antecedent as the most plausible candidate, which accounts for its robustness. The paper describes modifications for the treatment of anaphora in multiple languages and corresponding evaluations. Mitkov stresses the point that a bilingual implementation of his algorithm is superior to a monolingual and surveys the work done here. In the second step, a modified, fully automated version of the resolver, called MARS, is introduced. MARS is a fully automatic implementation of an improved version of the original resolver, where "fully automatic" means that there is no human intervention at any stage of the resolution of the anaphora. MARS uses a Functional Dependency Grammar parser as pre-processing tool. As for indicators, three more are used than in the original implementation: (ix) pronouns are allowed as possible antecedents and given a bonus; (x) syntactic parallelism is awarded a boosting score; (xi) frequent candidates are preferred. The paper describes the algorithm which uses these indicators, and a genetic optimization algorithm is introduced that leads to an improvement in performance. The resolver is then evaluated according to different criteria, e.g., with and without optimization by the genetic algorithm. As with the previous implementation, a version for non-English anaphora resolution is described, this time for Bulgarian, as well as the evaluation.
Ch.8 contains a discussion of evaluation in anaphora resolution. A distinction is drawn between the evaluation of the resolution algorithm and of the system as a whole. A number of measures is introduced and the applicability to algorithm and system . The measures are introduced in contrast to earlier proposals by Aone and Bennett (1995) and Baldwin (1997). For evaluation of the algorithm, "success rate", "critical success rate" and "non-trivial success rate" are distinguished. They are proposed for measuring the performance of the algorithm. Once this is achieved, comparative evaluations are envisaged (and indeed a couple of comparisons made). Finally, the possibility to establish the "decision power" or "relative importance" of indiviual components of the algorithm is discussed. The measures "(non-trivial/critical) success rate" then are applied to the resolution system as a whole. Additionally, "resolution etiquette" is proposed as an indicator for the efficiency of determining non-nominal anaphora. The topic of reliability of an evaluation in the context of anaphora resolution is discussed. The author proposes an evaluation workbench (which already is implemented by Catalina Barbu) as a tool for "fair" evaluation. At the end of this chapter, other work on evaluation is surveyed.
The previous chapters 1 - 8 are presentations of previously and recent work. The last chapter 9 concentrates on outstanding issues.
Ch.9 briefly summarizes central topics of the book. This is followed by a detailed discussion of three issues the author views as central for the future development of the field: research in the factors that are used by resolution algorithms, improvement of pre-processing, and the need for annotated corpora. A number of other outstanding issues are raised in the last section of the book. The author hints at the freely available material that can be obtained from his projects URL (which has meanwhile changed to http://clg.wlv.ac.uk).
EVALUATION
The chapters are to some extent self-contained, i.e., most of the knowledge one needs to comprehend each chapter is introduced right there. This has both the advantage that each chapter (and, in some cases - e.g., the surveys of implementations in Ch.4 - each section within a chapter) can be read in isolation and the disadvantage of being redundant to the same degree.
As to the layout of the book, it is somewhat strange that Ch.8 on evaluation does not precede Ch.7, which introduces Mitkov's own approach and puts much emphasis on evaluation. Given the degree of self-containedness of the chapters, there is no actual loss of readability.
With regard to future work in the area of automatic anaphora resolution, it could be added that more diverse domains would be desirable than those which are currently treated. I would like to take an extreme example: There are difficulties in, e.g., spoken language that do never occur in written text like manuals. Here, it would be interesting to see which strategy had to be pursued in order to come to grips with multi-speaker sequences such as the following taken from our own corpus. (aX) marks utterance X from speaker a, (bY) utterance Y from speaker b.
(a1) Well, now you take (b1) a bolt (a2) an orange one with a slit (b2) yes (a3) and you put it through there (b3) from above (a4) from above so that the three get fixed then (b4) yes
One of the problems with this short stretch of discourse is that the utterances in the dialog do not consist of complete sentences or illocutionary acts. So many of the indicators that have been used in the accounts that are detailed in the book can not be applied for this case. But note that the pronoun "it" in a3 has either "a bolt" (b1) or "an orange one..." (a2) as an antecedent. Another interesting feature of the sample discourse is the connection between a3 and b3, which is clearly anaphoric in that b wants to know in which direction the bolt has to be put through some hole in a bar. (This, of course, is not a case of NP anaphora.) At the moment, it seems clear that resolving anaphora in spoken language would be too big a task.
The desire for more diversity by no means diminishes the value of the book under review. To modify a quote from the book that is intended to exemplify a case of anaphora: "|The book| is not merely a survey of anaphora resolution: |it| also presents the latest research by the author." I would add: "Every library should have a copy of |it|."
|