Building natural language generation systems Ehud Reiter and Robert Dale Studies in Natural Language Processing Cambridge University Press, 2000, 248 pages
Reviewed by Constantin Orasan
Computers are very good at dealing with large amounts of data in a relatively short time. They can store and analyse data about daily temperature or amount of rain in a region, being able to produce reports and forecasts. Computers as expert systems have been widely used for taking decisions on the basis of their inputs. For example, using a smoker's answers to a questioner about his/her smoking habits, a program can suggest ways for giving up smoking. However, in most of the cases the results of an algorithm are very difficult, if not impossible, to be properly understood by humans. Therefore, in order to take full advantage of these results, it is necessary to display them in a format which is appropriate for the algorithm and can be easily understood by humans. Such formats can be histograms for values that change over time (e.g. share prices), text for weather reports, etc.
This book tackles the problem of building systems which generate natural language on the basis of some input information, produced by an algorithm, stored in databases, etc. Natural language generation (NLG) is a field developed at the confluence of artificial intelligence with computational linguistics. Even though the field started in late sixties and has been investigated by numerous researchers, there are no books which discuss practical issues in building complete NLG systems. This book is trying to fill this existing gap by describing the architecture of a hypothetical NLG system, WeatherReporter. In addition to this, comparison with well- known existing systems is made (two of them being the above mentioned ones for generating weather reports and smoking- cessation letters).
SYNOPSIS
In chapter one, natural language generation is defined as a task which "typically starts from a non-linguistic representation of information as input ... uses knowledge about language and the application domain to automatically produce reports, documents, explanations and other kind of texts". Natural language generation and natural language understanding are compared and considered two inverse tasks. This idea made some researchers hope that it is possible to make systems which can understand and generate language using common resources. Although, this idea is very appealing, the authors of this book draw attention to the fact that it is very difficult to build effective systems which perform both tasks because the problems to be solved are quite different in each case. In the second half of the chapter, a short history of the field is presented and successful systems are introduced. Most of these systems are used later in the book for exemplification and comparison.
Natural language generation is a field in which difficult problems arise. The second chapter of the book discusses some of these problems. For example, the cost of building an NLG system has to be considered. As the authors point out, there are cases when mail merge or human authoring is a cheaper or more appropriate solution. Moreover, it is emphasised that there are cases when people are reluctant to use an NLG system (especially in those cases when errors in text can have important implications e.g. health related contexts). However, when an NLG system is built, a corpus to determine the user requirements has to be assembled. The issues involved in building this corpus and evaluating NLG systems are also discussed in the second chapter.
Whereas the first two chapters are rather general introductions to the natural language generation field, chapter three presents the main concern of the book: the building of NLG systems. Given the complexity of this task, the advantages of a pipeline modular architecture are discussed. As is pointed out such an architecture is easier to develop and debug, leading also to reusability of the modules. The chosen architecture is also compared with other existing layouts, emphasising its advantages and weaknesses. A positive aspect of this discussion is the fact that it can be used not only for building NLG systems, but also for all kinds of natural language systems.
The generation process is decomposed into three modules: document planner, microplanner and surface realiser, presented in chapters 4, 5 and 6, respectively. The second part of chapter 3, introduces these modules giving an overview of the architecture of WeatherReporter. Due to the fact that all the topics discussed here, are reanalysed in the subsequent chapters in much more detail, I found this part a bit too long and too detailed. However, this could be useful for readers who do not have much time, giving them a fairly detailed overview of the system.
Chapter 4 details the first module of a NLG system: the document planner. The document planner is responsible for deciding which information is communicated and determine how this information should be structured for presentation. Two submodules are used: content determination and document structuring. The document planner is the most-application dependent module relying on knowledge specific to the input data. The WeatherReporter is used for exemplifying the way of selecting and representing the information communicated by the system (referred to in the book as messages). Important issues, like granularity and the level of abstraction of the messages, are discussed. The large number of examples makes the chapters easy to understand, although as I point out later, for some readers, it would have been helpful to explain the formalisms used for these examples.
The document planner produces a general structure of the document. As the authors argue in chapter 5, this structure is too general to produce text directly from it. As a result, they introduce an intermediate module, microplanning, which involves lexicalisation, aggregation and generation of referring expressions. Some researchers do not include a microplanner in their systems, but in this book it is argued that it is better to include it given that the document planner relies on domain knowledge and the surface realiser on linguistic knowledge; therefore it is necessary to include a module which pays attention to interactions between domain and linguistic knowledge.
The first task of the microplanner is to choose the words and syntactic structures which communicate the information in a document plan. Several ways for choosing the words and discourse relations are discussed using a large number of examples. Once the lexicalisation decides how different concepts are expressed in words and syntactic structures, the aggregation module improves the readability of the generated text by combining similar sentences. Several aggregation rules are specified. Another way to improve the readability of the text is by using referring expressions. A referring expression is a way of referring to a previously introduced entity, without describing the entity again. Pronouns and noun phrases are the most common ways of referring to an already introduced entity. Generation of such expressions raises special problems like: when and how a pronoun should be used for referring to an entity, or when a noun phrase is used as a referring expression, should it be realised fully or in a reduced form? Correct generation of referring expressions is very important for the overall readability of a text. On the one hand, if there are ambiguities, the text will not be easily understood. On the other hand, a text which always specifies full NPs will be too repetitive.
In order to show how it is possible to generate referring expressions, the authors use linguistic theories for explaining the phenomenon. I found it surprising that the authors do not mention at all the constraints introduced by the governing and binding theory, and centering theory between an initial and subsequent reference, and how they can be used to ensure the grammatical correctness and discourse coherence of the generated text.
The sixth chapter describes the process of mapping abstract text specifications into a surface text by the surface realiser. It consists of two processes: structure and linguistic realiser. The first one is in charge of the layout of the document, mapping the internal representation into coding specific to the medium the generated text is displayed in (e.g. paragraphs, the way the headings are displayed, etc.). The second process produces the actual text of the document and relies on linguistic knowledge. Given the complexity of this task the authors opt to use an existing realiser, rather than building one from the scratch. Three possible choices are presented: KPML based on systemic functional grammar, SURGE based on functional unification grammar and RealPro based on meaning-text theory, a form of dependency grammar. The implications of choosing one realiser instead of another are discussed. Whereas the previous three chapters were very practical, the sixth one is rather theoretical, explaining the theories behind these systems.
The final chapter of the book focuses on generating documents rather than simple texts, emphasising the general tendency in natural language processing to process and produce multimedia documents. As the authors point out, the main difference between texts and documents is the fact that the latter include not only text, but also graphics, hyperlinks, sound, etc. Different issues involved in the generation of documents are discussed. This chapter is general, without going into too much detail about ways of generating such material. However, this is normal given the fact that almost each of these topics can be the subject of a book.
COMMENTS
The book is a pleasant and relatively easy reading, especially because of the large number of examples. However, a weak point of the book is the authors' assumption about the reader's knowledge. The formalisms used for examples are used without being explained. It is true that the attribute-value matrix is a well-known format for representation of data, but still a short explanation would have been helpful. Not so familiar and straightforward is the pseudocode used throughout the book, and especially the one used to define the message types (p. 61-70). Written in a C++-like style, it could confuse some readers unfamiliar with object-oriented programming. A solution could be to explain uncommon notions at the end of the book in an appendix. In this way the flow of reading is not interrupted, and a reader, who is unfamiliar with any of these concepts, could clarify them. The authors also compare simple lexicalisation with localisation class in Java, but for a reader unfamiliar with Java this is not very helpful.
A positive point that is worth noting is the way the authors develop different arguments. They do not try to persuade the reader that their choice is the correct one. Instead, they present a good balance of pro and contra arguments for their choice. In most of the cases their choice is based on pragmatic considerations (as it should be given that the book addresses practical issues in building NLG systems).
Although multilingual natural language generation is not the topic of this book, several times the authors point out how the described techniques can be used for generation of multilingual text.
I recommend this book to all the people involved in building natural generation systems and other kinds of systems for processing language. Although, the book does not concentrate on describing algorithms for generation, the extensive further reading section at the end of each chapter can be used as a starting point for getting information about useful algorithms.
Constantin Orasan in doing a PhD in Automatic Summarisation at University of Wolverhampton, U.K. In addition to automatic summarisation, his other current research interests are anaphora resolution, corpus building and analysing, and machine learning techniques for natural language.
|