Editor for this issue: Tomoko Okuno <tomoko
linguistlist.org>
Androutsopoulos, Ion (2002) Exploring Time, Tense and Aspect in Natural Language Database Interfaces. John Benjamins (Natural Language Processing Series, Vol. 6), x+307pp, hardback, ISBN 9027249903 and 1588112691, US$104, EUR116. http://www.linguistlist.org/issues/13/13-2681.html Pablo A. Duboue, Computer Science Department, Columbia University, USA SYNOPSIS Natural Language Interfaces to Data Bases (NLIDBs) are the focus of study of this book, an unabridged version of Dr. Androutsopoulos' doctoral dissertation (Edinburgh, 1996). He specifically deals with modeling temporal aspects of English questions into temporal extensions to the most widespread data base query language (SQL). Even though his dissertation precedes the book by a good six years, the book is still worth reading for a wide audience. Researchers on NLIDBs and most importantly, on temporal aspects of them, will be the readers that will profit from the book as a whole. However, active researchers in that area will surely be more acquainted with leading edge advances. Therefore, the book can well be considered as fundamental reading, ideal to researchers new to the field. Aside from its obvious candidates, the book can be of particular interest to linguists working on representing temporal issues (time, tense and aspect) and instructors of HPSG courses. The theoretical nature of the book makes it accessible to people with no computer science background, while its empiricist tendency greatly helps ground certain abstract concepts such as aspectual taxonomies and meaning representation languages. Moreover, the author's example HPSG grammar plugs directly into Pollard and Sag (1994), allowing an instructor teaching HPSG grammar to include a ''real world'' example. Finally, a developer working on adding temporal capabilities to a NLIDB may want to look at the book. A word of caution is in order in this case, as ''Exploring...'' can be pretty terse reading for developers and the prototype system is directly coded in Prolog and generates TSQL2, a temporal extension of SQL that was not persuasive enough on standards committee grounds. DETAILED ANALYSIS The book can be roughly divided into three parts: * Linguistic temporal issues (Chapters 1-3) * From English to a meaning representation (TOP) (Chapter 4) * From the meaning representation to SQL (Chapters 5-6) Each of these parts has a different focus and can be of interest to different audiences. However, they normally use material from earlier chapters, so isolated reading may be difficult. The discussion in the book is grounded in a fictional ''airport'' domain. The domain contains 20 relations, expressing the different temporal phenomena the book is interested in covering. The first part describes the aspectual taxonomy employed by the author. The four classes (states, activities, culminating activities and points) are inspired on the work from Vendler (1967) and similar to the classes employed by Moens (1987), Passonneau (1988), and Blackburn, Gardent and de Rijke (1994). >From there, the English tense system is introduced, explaining at some length how the different aspectual classes interact with it. The discussion is well motivated and this part of the book is by far the easier to understand in a first reading. Moreover, in a classroom setting, it can be of interest to present the interaction between aspect and tense by means of questions taken from the ''airport'' domain. Different linguistic phenomena are presented and the discussion of them seems very comprehensive. However, a subset of them is actually implemented in the rest of the book. This decision is sensible, in the sense that computational systems are always of limited coverage. However, the rationale behind each decision is not empirically motivated. At times it seems to obey more reasons of simplicity or the ability of the state of the art to capture more easily some phenomena than others. The author seems to work on the assumption that full coverage of every linguistic phenomenon is an attainable and desirable goal. A more bottom-up approach, working from actual questions to real temporal databases can be extremely interesting for the sake of comparison. The second part of the book defines a meaning representation language (TOP) and a methodology to transform English questions to it. The meaning representation language is very rich and it is designed to easily capture temporal expressions. It is a language based on temporal operators, similar to Prior (1967) operators. However, TOP is a formal language, not a logic. No inference rules for TOP are provided and the author claims the language is only suitable as an intermediate language for transference to SQL or other database access languages. Nevertheless, the language is thoroughly defined, it seems easy to understand when written down (it is unclear the same goes for temporal logics, for instance) and a considerable part of the book is devoted to transforming English to TOP. I would like to think TOP can be applicable in other settings as a means to capture the temporal meaning of English expressions. If that were the case, then the book's potential contribution would be much more significant, as the description of both TOP and the transformation from English to TOP is very detailed. Such effort seems worth reusing whenever possible. In particular, the book provides a very nice example of application and extension of off-the-shelf HPSG theory, as defined by Pollard and Sag (1994). The author relies on a simplified semantic analysis, without the situation theoretical approach from Pollard and Sag (1994). A new ''aspect'' feature is added to the HPSG signs and an ''aspect principle'' is added to the theory. Domain information is represented as an extension to the sort hierarchy. The author analyzes the grammar in full detail, together with the mechanism to extract the TOP sign from it. All in all, Chapter 4 is an appropriate synthesis of computational linguistics: a sound linguistic discussion, to a level of detail required by a computational implementation. The third part is the most terse segment of the book. It is mostly intended for computer scientists, as it defines the methodology for transforming TOP expressions to machine executable instructions (SQL). The actual transformation rules are somewhat easy to grasp, but the discussion is very thorough and a formalization of both Temporal SQL and the transformation mechanism is provided. Moreover, the mechanism is proved correct. At first glance, this level of formality seemed unnecessary, but a closer inspection shows that it is a requirement, taking into account the generated SQL is not plugged into a real system (avoiding any chance of an actual empirical evaluation of correctness). This third part should be of interest to implementers of NLIDBs, more precisely, researchers developing experimental NLIDBs, given the complexity of the section. On behalf of real systems, the author points out several extra requirements that are missing from the system as described in the book: * An input pre-processing module, in the form of appropriate tokenization and domain terminology detection and conflation. * A disambiguation module among parse trees, as the HPSG grammar returns two or three parse trees for a good number of cases. * A Natural Language Generation module to generate cooperative responses in the event of ambiguity or to expose the right answer to the user when false implications may be detected (e.g., the user may pose a question such as ''Does plane BA737 circle?'' that implies 'circle' as having an habitual reading, but if that is not the case on the domain, the system will respond just 'no' instead of explaining to the user that the question is not possible). As generation is of particular interest to this reviewer, I include here some specific observations on the author's treatment of this issue. Cooperative response generation is discussed in a good dozen places throughout the book. While its need is remarked and the places where it is needed are highlighted, it is unclear that the overall framework can be easily extended to cope with response generation. In particular, it may be the case that not enough information is kept after parsing in the form of a TOP formula to build such a cooperative response. It would have been interesting for the author to investigate this issue further, but from the very beginning he made clear that response generation was not going to be dealt with in this work. The book concludes with a discussion of related and further work. Since the time Dr. Androutsopoulos finished his dissertation at Edinburgh, Dr. Nelken defended in 2001 a related dissertation at the Technion Institute (Israel). Dr. Nelken made several comments and observations (sometimes negative) to the author's work. The book is one of those rare opportunities to read in print comments made a posteriori in a work done a priori. That discussion, together with the roughly 15% new bibliographic items added since the dissertation was defended, render the book up-to-date. The book is accompanied with all the source code (in the ALE grammar workshop) in a companion website. The website has no broken links and contains not only the necessary source code, but also pointers to the language (SWI Prolog) and grammar interpreter (ALE workshop). Downloading language, interpreter and code was a matter of minutes and all the examples from the book executed correctly. OVERALL ANALYSIS The book is true to its name; it explores the issues of time, tense and aspect, keeping NLIDBs as an empirical grounding for an otherwise theoretical discussion. However, the discussion in the book is leading the state of the art in NLIDBs towards more empirical and less exploratory work. Moreover, the book contains clear place-holders for researching issues such as evaluation and cooperative response generation. On that behalf, the book provides a necessary milestone on that worthy path. REFERENCES Blackburn, P., Gardent, C., and de Rijke, M. (1994). Back and forth through time and events. In D.M. Gabbay (Ed.), Proceedings of the First International Conference on Temporal Logic (pp. 225-237). Boon, Germany. Springer-Verlag. Moens, M. (1987). Tense, Aspect and Temporal Reference. Ph.D. thesis, Centre for Cognitive Science, University of Edinburgh, U.K. Passonneau, R.J. (1988). A computational model of the semantics of tense and aspect. Computational Linguistics, 14(2),44-60. Prior, A. (1967). Past, PResent and Future. Oxford University Press. Pollard, C. and Sag, I.A. (1994). Head-Driven Phrase Structure Grammar. University of Chicago Press and Center for the Study of Language and Information, Stanford. Vendler, Z. (1967). Verbs and times. In Linguistics in Philosophy, Chapter 4 (pp.97-121). Cornell University Press. ABOUT THE REVIEWER Pablo Ariel Duboue is a PhD candidate working under the supervision of Dr. Kathleen McKeown at the Natural Language Processing group, Columbia University in the City of New York (USA). His research interest falls in the area of Natural Language Generation, mainly on the automatic construction of content planners from aligned corpora. More information about Pablo is available at http://www.cs.columbia.edu/~pabloMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue