Editor for this issue: Andrew Carnie <carnie
linguistlist.org>
Klavans J L, Resnik P, (1996)The Balancing Act, Combining Symbolic and Statistical Approaches to Language, MIT Press, Cambridge Mass., London England. PP xii and 186. Pbk $17.50 Reviewed by Sam Salt <D.W.SaltMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuederby.ac.uk> Synopsis This book brings together a collection of papers first presented at an ACL workshop in 1994, the purpose of which was "to provide a forum in which to explore combined symbolic and statistical approaches to natural language." For anyone with a particular interest in how natural language processing (NLP) systems can be engineered these papers provide an excellent overview of some practical approaches to the problem. The difference here is that not only have the authors made a concerted effort to use both statistical and symbolic techniques in a hybrid manner but they have also been at pains to evaluate the costs of such an endeavour. Although the research described is essentially of a practical nature, the papers also raise some interesting theoretical questions about the nature of language. Review In the 1950s Artificial Intelligence made many extravagant claims regarding the solution of problems in human cognition. A good example is Artificial Neural Networks which, it was claimed, would model human thinking processes. However, both the models used and the computational power available proved inadequate and Neural Networks languished, not to be revived until the mid 1980s when sufficiently powerful machines became available. As with Neural Networks, so it was with the statistical analysis of language. Shannon had suggested a number of possible techniques as early as the late 40's but again lack of computing power ensured that no real progress was made. Symbolic computational techniques quickly moved in to fill the gap and gained kudos through association with Chomsky's theories. It was not until the mid 80's that statistical techniques once more began to find favour. But by this time the symbolic approach was so well established that there has been a tendency for the symbolic and statistical camps to go their separate ways. It would be no exaggeration to say that the two camps have often been quite hostile towards each other. The papers contained in this book try to show how the two techniques can be mutually complementary and lead to improved analyses. To start in the middle of the collection, Vasileios Hatzivassiloglou's paper gives a good overview of the history of this situation and describes work which tries to show that using both approaches can pay off in the long run and actually produces quantifiable improvements. He even tells us that coding for his system took seven person-months for the statistical components and five person-months for the linguistic components and goes on to evaluate the relative contributions of the two components to the system. Although he recognises that knowledge-based approaches have scaling problems he nevertheless feels that there is an intuitive belief that the linguistic models which they often represent are likely to lead to improved performance when merged with statistical approaches. He is able to plausibly report, of his finished system, that "many forms of linguistic knowledge make a significant positive contribution to the performance of the system." I would recommend his paper as a good starting point for newcomers as it both outlines the problem and illustrates a plausible route to solutions. The fact that timescales are discussed comes as a bonus as it is often difficult to gauge whether work described in papers has taken days, years or decades to produce results. This is as good an example of how the two techniques can be combined as you are likely to find. Steven Abney's paper on "Statistical Methods and Linguistics" bought home the fact that linguistics so often deals with competence and not performance. Computational models often distinguish just between grammatical and ungrammatical structures but this distinction alone is insufficient for a real natural language where we might ask whether a sentence "sounds" right. He advances a system for weighting sentences according to their apparent degree of sounding natural. Rose and Waibel also begin with a discussion of the performance versus competence problem and quite rightly state that it is currently impossible to conceive of a parser that would deal with all the complexities of performance. They go on to describe a language-to-language translator which tries to resolve ambiguities by entering into a dialogue with users. I was mostly unconvinced by this, as sample transcripts of the dialogue were so lengthy that I concluded it would be cheaper and quicker to employ a human translator. In fairness they do recognise this weakness and hope to simplify matters in future versions of the software. Beatrice Daille contributes a useful paper on working with domain-specific terminology "using shallow syntactic relationships to define the co-occurrences over which statistical methods operate..", which shows how collocation can be used to good effect in this field. Patti Price raises the question of speech recognition and describes how this "traditionally" has been much more amenable to statistical analysis than the written word but points out that the success of such approaches depends on the analysis ignoring the social aspects and effects of language. Ramshaw and Marcus discuss a corpus-based method using Brill's tagger. The papers in this collection are of variable quality, as is so often the case in such volumes. My notes record that I found at least one of the papers had odd sentence structure and eccentric punctuation which made it difficult for me to parse, let alone a machine. Overall though I can recommend this is a good starting point for anyone prepared to admit that statistical and symbolic approaches can be usefully combined. I am convinced that both techniques have contributions to make to the field and I hope this collection is just the first of many more to come. About the Reviewer David Salt is Head of the Division of Computing, University of Derby, England. His specialist interests are Artificial Intelligence, Consciousness and Computational Linguistics. ********************************* Sam Salt Head of Division of Computing University of Derby Kedleston Road Derby DE22 1GB 01332-622222 Ext:1753 e-mail: d.w.salt
derby.ac.uk ********************************