This book "asserts that the origin and spread of languages must be examined primarily through the time-tested techniques of linguistic analysis, rather than those of evolutionary biology" and "defends traditional practices in historical linguistics while remaining open to new techniques, including computational methods" and "will appeal to readers interested in world history and world geography."
Review of A Way with Words: Recent Advances in Lexical Theory and Analysis
EDITOR: De Schryver, Gilles-Maurice TITLE: A Way with Words: Recent Advances in Lexical Theory and Analysis SUBTITLE: A Festschrift for Patrick Hanks SERIES TITLE: Menha Linguistics Series PUBLISHER: Menha Publishers YEAR: 2010
Esa Penttilä, English Language and Translation, University of Eastern Finland
This book commemorates the work and ideas of Patrick Hanks, a prominent and long-time contributor to the fields of lexicography, corpus linguistics and lexical theory, who has been responsible for editing some of the major dictionaries of English and continues his research for various academic institutions around the world. As often is the case with Festschrifts, it is not obvious for which audience the book is intended. Naturally, it is directed at the dedicatee himself, but since this would make a scarce audience, it should be of interest to others as well. The blurb at the back claims that the volume ''is essential reading for everyone interested in meaning, the lexicon, dictionaries and corpus analysis'', but I would say this is slightly exaggerated. I find the most fruitful readership for this book to be either scholars who are eager to learn what the current issues in lexicography are, or students and researchers at the early stages of their career. I return to this in the evaluation section below.
The book opens with the editor Gilles-Maurice de Schryver's short introduction to the career of Patrick Hanks. After this, the volume is divided into three parts, each dealing with different areas of lexicography to which Hanks has contributed. The first part contains five highly theoretical papers on various aspects of lexical meaning by eminent linguists and philosophers. The second part concentrates on corpus linguistics and the computational aspects of lexical meaning and contains seven articles that deal with various languages and the problems of (semi-)automatic analysis of digitized corpora. The last seven papers in the book take a somewhat less technical and more traditional approach to lexicography, concentrating on some of the more practical aspects of dictionary compilation -- without forgetting the crucial theoretical perspective either. In various ways, the papers nicely reflect Wittgenstein's (1953: §43) famous slogan, ''the meaning of a word is its use in the language'', which has been crucial for Hanks's own work, thus linking the texts with the philosophical tradition that extends from Frege to Quine's (1960) radical translation and further to Davidson's (1984) radical interpretation, although these notions are scarcely mentioned in the book.
Part I: Theoretical Aspects and Background
The first article in this section, ''Defining the Definiendum'', is the last one that John Sinclair is known to have been working on before his death. It presents a radical version of Sinclair's collocational approach to language arguing for the need to acknowledge the significance of multi-word semantic units and extending their treatment in lexicography and dictionary writing from the role they have traditionally had. The claim is backed up with corpus evidence of the word 'sever'. The position of Sinclair's unfinished draft at the beginning of the compilation emphasizes his long-lasting friendship with Patrick Hanks, who joined Sinclair's COBUILD project in the early 1980s.
''Very Large Lexical Entries and the Boundary Between Linguistic and Knowledge Structures'' by Yorick Wilks includes the text of a conference paper dating back to 1977, which has not been widely circulated, although it has been published. In it, Wilks discusses how extended lexical entries could be interpreted computationally by incorporating them into the Preference Semantics system as pseudo-texts, which are one of the types of frame in Minsky's (1975) sense.
In ''Mechanisms of Sense Extension'', James Pustejovsky and Anna Rumshisky investigate the creative aspect of lexical meaning and examine the way different extended senses of a predicate can be analyzed in the framework of the Generative Lexicon. The analysis is alleviated by including degrees of metaphoricity in the model, and the idea is based on the assumption that metaphorical meaning is structured and scalar in nature. The suggestion is illustrated with the help of case studies dealing with motion predicates and locative relations.
Igor Mel'čuk's ''The Government Pattern in the Explanatory Combinatorial Dictionary'' is a very technical and succinct account of lexical government in an Explanatory Combinatorial Dictionary (ECD), which is one of the main components in the framework of Meaning-Text theory (MTT). This chapter requires that one understand the basics of both MTT and ECD, as the author cordially points out at the beginning of the paper, referring the reader to helpful material, if that is needed.
''The Paradox of Analysis and the Paradox of Synonymy'' by David Wiggins takes the reader to the philosophical areas of linguistics by discussing the paradox of analysis first worded by C.H. Langford and the closely related paradox of synonymy. Wiggins discusses the topic by entertaining ideas from Leibniz, Frege, and Putnam, and shows how these philosophical questions have relevance also for the daily work of lexicographers, who have to solve similar problems in practice, although the practicing dictionary writers and philosophers of language may at times seem to be missing each others' point.
Part II: Computing Lexical Relations
The first article in this section, ''More is More'' by Kenneth W. Church, is a brief response to Kilgarriff's (2007) criticism of so-called Googleology. It discusses the ideal corpus size and quality with the conclusion that the bigger the corpus the better it is for research, and that the more data there is economically available for everyone in the field the better it is for the whole research community.
Gregory Greffenstette's ''Estimating the Number of Concepts'' also deals with quantity. The main aim is to estimate the possible number of concepts that Natural Language Processing (NLP) systems will have to deal with in the future when the step toward analyzing multiword expressions and the subsequent multiword concepts for the purposes of lexicography will be taken. On the basis of Web queries, Greffenstette comes up with a rough estimate of c. 233 million two-word combinations that are commonly used on the Web, thus indicating the scale with which computational lexicography will need to tackle this problem in the future.
The next three articles concentrate on empirical studies of multiword expressions in three different languages. In ''Identifying Adjectives that Predict Noun Classes'', David Guthrie and Louise Guthrie develop methods to help automatically identify the semantic class of head noun in noun phrases on the basis of preceding adjectives. They base their examination on three machine-readable English corpora and show that adjectives actually contain valuable information about the nouns they modify, and this information can be used in automatic tagging.
Alexander Geyken's ''Statistical Variations of German Support Verb Constructions in Very Large Corpora'' reports a study on three German light verbs and their coexisting noun-verb combinations in two different corpora to determine how important corpus size is for the results of lexicographic analysis. Geyken concludes that, up to a point, corpus size does indeed matter, but after it exceeds 500 million tokens there is fairly little new knowledge to be gained -- at least, with respect to the studied constructions.
In their paper ''A Case Study in Word Sketches -- Czech Verb vidět 'see''', Karel Pala and Pavel Rychlý apply a tool called the Sketch Engine (see Kilgarriff et al. 2004) to analyze the grammatical information provided by the Czech verb vidět 'see' and its environment to arrive at word sketches that should show how the word functions in Czech grammar. The errors found in the automatic analysis help the authors make suggestions for the improvement of both the tool and the method.
The last two papers in this section deal with two of Patrick Hanks' (2004, 2007) developments: the ''Pattern Dictionary of English Verbs'' (PDEV) and the Corpus Pattern Analysis (CPA). In ''The Lexical Population of Semantic Types in Hank's PDEV'', Silvie Cinková, Martin Holub and Lenka Smejkalová describe a work in progress at the Charles University in Prague, where PDEV is constantly being developed further with the help of CPA technique. The pilot study discussed here ends with a suggestion that it might be useful to create manually annotated testing data, in which collocates would be annotated with Semantic Types, since this would be likely to make PDEV more usable for NLP in the future.
Elisabetta Jezek's and Francesca Frontini's ''From Pattern Dictionary to Patternbank'' reports a study in which the PDEV approach is applied to Italian and thus describes the first attempts at creating a Patternbank for Italian. At the same time Jezek and Frontini show how the general reliability of the PDEV technique can be improved by extending it to include ''the annotation of verb patterns onto the corpus instances that instantiate them'' (p. 215), and this makes it more useful for analyzing phenomena related to the syntax/semantics interface as well as for various NLP applications.
Part III: Lexical Analysis and Dictionary Writing
In ''Words that Spring to Mind: Idiom, Allusion, and Convention'', Rosamund Moon investigates the phraseological reality of 'spring to mind' in a corpus study, in which she compares her observations with dictionary definitions, coming to the conclusion that it is indeed the usage of phrases that should overrule dictionary definitions whenever we aim to understand their true meanings.
Sue Atkins' ''The DANTE Database: Its Contribution to English Lexical Research, and in Particular to Complementing the FrameNet Data'' compares the two databases mentioned in the title, DANTE and FrameNet, and makes a suggestion about how to semi-automatically map their semantic analyses together to be a step closer toward realizing the lexicographer's dream.
Adam Kilgarriff and Pavel Rychlý combine philosophy of language with computational linguistics in their paper ''Semi-Automatic Dictionary Drafting'', in which they point out how the Theory of Norms and Exploitation developed by Hanks links philosophical ideas to concrete data retrieval in corpora. They present a software solution called Semi-Automatic Dictionary Drafting, which should solve some of the problems related to so-called Word Sense Disambiguation, which continues to make the dictionary writer's life difficult.
In ''Lexicography: Science without Theory?'', Paul Bogaards ponders whether there actually exists a true lexicographic theory or not, and takes the idea further by questioning whether we even need one. According to him, it is obvious that no unitary theory has yet come into existence. After all, there is no agreement on what such a theory should deal with. However, there are various theories that are useful for lexicography and they should all be utilized and developed further to help improve the craft of dictionary writing in the future, without forgetting that pure serendipity also has its place in this development.
Mirosław Bańko's ''The Polish COBUILD and its Influence on Polish Lexicography'' describes the creation process behind the Polish general-purpose dictionary that he edited in 2000 and that was modeled after the COBUILD English dictionaries. Although the dictionary did not prove to be a commercial success, its general influence on Polish lexicography shows how ideas in one country can be implemented in slightly different domains than they were originally intended for.
In his article ''ARGOT: The Flesh Made Word'', Jonathan Green delves into the history of the French occupational slang of criminal classes by extending a paper published earlier in Critical Quarterly. Green shows, in an interesting account, how crucial various trials and literary works have been for making the general audience familiar with the vocabulary of this particular form of jargon, which has now become extinct.
The compilation ends with Michael Rundel's ''Defining Elegance'', where the author discusses the rationale behind the lexicographer's solutions when the aim is to create a dictionary that is useful for its ordinary users rather than for theoreticians or other lexicographers. Although computers have enabled us to restore and present more information in an easily accessible form than ever before, many of the basic ideas related to the elegance of dictionary writing that were already usable at the time of Johnson (1775) are still topical today -- and most likely will remain to be so in the future as well.
''A Way with Words'' provides an extensive view into both the questions discussed in modern lexicography and the methods with which answers to these questions are sought at the moment. The book touches on most of the areas relevant in the field. At the same time it shows the current trends and the direction in which research is at present heading. For example, the role of large digital corpora and tools for analyzing them is enormous. Although some things can still be done without corpora, full-fledged modern lexicography is simply impossible without corpus linguistic methods and computers -- but this of course applies to most other fields of linguistics as well.
As a coursebook, this book is not the most usable one. Although the articles cover a wide range of topics and are fairly short, their content is dense and often requires prior understanding of concepts and theoretical ideas developed earlier. Moreover, the ideas entertained in the book have been discussed more extensively elsewhere, which means that there are more comprehensive accounts of these themes available in various other publications. So, the book is clearly written for scholars who are somewhat familiar with the field and now have a chance to update their information on its recent advances and future prospects.
As a Festschrift, the book provides a very illustrative insight into the work and ideas of Patrick Hanks and the contribution he has made to lexicographical research. Although Hanks did not write a single word in the book, its contents reflect his original intuitions and make use of his ideas and theorizations in a way that nicely introduces them to the readers. There is no doubt that this book pays homage to its dedicatee. The authors also point out how their own ideas got inspiration from Hanks and recite amusing anecdotes about him, so that after reading the book one feels as if one personally knew the man on the cover of this book.
This is also the greatest value of the book. By showing how the authors came to know Patrick Hanks and how the ideas that they discuss relate to his ideas, the texts turn this book into a valuable view into the development of scientific ideas. Each paper adds to the mosaic that reflects the sociological reality behind academic research. This is also why I think that MA students, postgraduate students and early-career researchers would make an ideal readership for this book; they would find it illuminating, even fascinating, to learn how the research community functions and how theories evolve and ideas develop through communication and -- sometimes unexpected and incidental -- contacts that we make with various other people in the field. This is something that old-timers are already familiar with, but for newcomers in academia this information would be worthwhile.
Davidson, D. (1984) Inquiries into Truth and Interpretation. Oxford: Clarendon Press.
Hanks, P. (2004) Corpus Pattern Analysis. In G. Williams & S. Vessier (eds.), Proceedings of the Eleventh EURALEX international Congress, EURALEX 2004, Lorient, France, July 6-10, 2004. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud, 87-97.
Hanks, P. (2007) Pattern Dictionary of English Verbs (PDEV) -- Project Page. Online at http://deb.fi.muni.cz/pdev/.
Johnson, S. (1755) A Dictionary of the English Language. London.
Kilgarriff, A. (2007) Googleology is Bad Science. Computational Linguistics 33.1: 147-151.
Kilgarriff, A., P. Rychlý, P. Smrž & D. Tugwell (2004) The Sketch Engine. In G. Williams & S. Vessier (eds.), Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6-10, 2004. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud, 105-116. (See also http://www.sketchengine.co.uk.)
Minsky, M. (1975) A Framework for Representing Knowledge. In P. Winston (ed.), The Psychology of Computer Vision. New York: McGraw-Hill, 211-277.
Quine, W. V. O. (1960) Word and Object. Cambridge, MA: MIT Press.
Wittgenstein, L. (1953) Philosophical Investigations. Translated by G. E. M. Anscombe. Oxford: Blackwell.
ABOUT THE REVIEWER
ABOUT THE REVIEWER:
Esa Penttilä is postdoctoral researcher at the University of Eastern
Finland (Department of English Language and Translation). He received his
PhD at the University of Joensuu in 2006. His research interests include
idioms and idiomaticity, figurative language and metaphors,
culture-specific translation, the syntax/semantics interface, and
philosophy of language.