LINGUIST List 22.2648|
Fri Jun 24 2011
Review: Lexicography; Linguistic Theories: De Schryver (2010)
Editor for this issue: Monica Macaulay
This LINGUIST List issue is a review of a book published by one of our supporting publishers, commissioned by our book review editorial staff. We welcome discussion of this book review on the list, and particularly invite the author(s) or editor(s) of this book to join in. If you are interested in reviewing a book for LINGUIST, look for the most recent posting with the subject "Reviews: AVAILABLE FOR REVIEW", and follow the instructions at the top of the message. You can also contact the book review staff directly.
1. Esa Penttilä ,
A Way with Words: Recent Advances in Lexical Theory and Analysis
Message 1: A Way with Words: Recent Advances in Lexical Theory and Analysis
From: Esa Penttilä <esa.penttilauef.fi>
Subject: A Way with Words: Recent Advances in Lexical Theory and Analysis
E-mail this message to a friend
Discuss this message
Announced at http://linguistlist.org/issues/21/21-3066.html
EDITOR: De Schryver, Gilles-Maurice
TITLE: A Way with Words: Recent Advances in Lexical Theory and Analysis
SUBTITLE: A Festschrift for Patrick Hanks
SERIES TITLE: Menha Linguistics Series
PUBLISHER: Menha Publishers
Esa Penttilä, English Language and Translation, University of Eastern Finland
This book commemorates the work and ideas of Patrick Hanks, a prominent and
long-time contributor to the fields of lexicography, corpus linguistics and
lexical theory, who has been responsible for editing some of the major
dictionaries of English and continues his research for various academic
institutions around the world. As often is the case with Festschrifts, it is not
obvious for which audience the book is intended. Naturally, it is directed at
the dedicatee himself, but since this would make a scarce audience, it should be
of interest to others as well. The blurb at the back claims that the volume ''is
essential reading for everyone interested in meaning, the lexicon, dictionaries
and corpus analysis'', but I would say this is slightly exaggerated. I find the
most fruitful readership for this book to be either scholars who are eager to
learn what the current issues in lexicography are, or students and researchers
at the early stages of their career. I return to this in the evaluation section
The book opens with the editor Gilles-Maurice de Schryver's short introduction
to the career of Patrick Hanks. After this, the volume is divided into three
parts, each dealing with different areas of lexicography to which Hanks has
contributed. The first part contains five highly theoretical papers on various
aspects of lexical meaning by eminent linguists and philosophers. The second
part concentrates on corpus linguistics and the computational aspects of lexical
meaning and contains seven articles that deal with various languages and the
problems of (semi-)automatic analysis of digitized corpora. The last seven
papers in the book take a somewhat less technical and more traditional approach
to lexicography, concentrating on some of the more practical aspects of
dictionary compilation -- without forgetting the crucial theoretical perspective
either. In various ways, the papers nicely reflect Wittgenstein's (1953: §43)
famous slogan, ''the meaning of a word is its use in the language'', which has
been crucial for Hanks's own work, thus linking the texts with the philosophical
tradition that extends from Frege to Quine's (1960) radical translation and
further to Davidson's (1984) radical interpretation, although these notions are
scarcely mentioned in the book.
Part I: Theoretical Aspects and Background
The first article in this section, ''Defining the Definiendum'', is the last one
that John Sinclair is known to have been working on before his death. It
presents a radical version of Sinclair's collocational approach to language
arguing for the need to acknowledge the significance of multi-word semantic
units and extending their treatment in lexicography and dictionary writing from
the role they have traditionally had. The claim is backed up with corpus
evidence of the word 'sever'. The position of Sinclair's unfinished draft at the
beginning of the compilation emphasizes his long-lasting friendship with Patrick
Hanks, who joined Sinclair's COBUILD project in the early 1980s.
''Very Large Lexical Entries and the Boundary Between Linguistic and Knowledge
Structures'' by Yorick Wilks includes the text of a conference paper dating back
to 1977, which has not been widely circulated, although it has been published.
In it, Wilks discusses how extended lexical entries could be interpreted
computationally by incorporating them into the Preference Semantics system as
pseudo-texts, which are one of the types of frame in Minsky's (1975) sense.
In ''Mechanisms of Sense Extension'', James Pustejovsky and Anna Rumshisky
investigate the creative aspect of lexical meaning and examine the way different
extended senses of a predicate can be analyzed in the framework of the
Generative Lexicon. The analysis is alleviated by including degrees of
metaphoricity in the model, and the idea is based on the assumption that
metaphorical meaning is structured and scalar in nature. The suggestion is
illustrated with the help of case studies dealing with motion predicates and
Igor Mel'čuk's ''The Government Pattern in the Explanatory Combinatorial
Dictionary'' is a very technical and succinct account of lexical government in an
Explanatory Combinatorial Dictionary (ECD), which is one of the main components
in the framework of Meaning-Text theory (MTT). This chapter requires that one
understand the basics of both MTT and ECD, as the author cordially points out at
the beginning of the paper, referring the reader to helpful material, if that is
''The Paradox of Analysis and the Paradox of Synonymy'' by David Wiggins takes the
reader to the philosophical areas of linguistics by discussing the paradox of
analysis first worded by C.H. Langford and the closely related paradox of
synonymy. Wiggins discusses the topic by entertaining ideas from Leibniz, Frege,
and Putnam, and shows how these philosophical questions have relevance also for
the daily work of lexicographers, who have to solve similar problems in
practice, although the practicing dictionary writers and philosophers of
language may at times seem to be missing each others' point.
Part II: Computing Lexical Relations
The first article in this section, ''More is More'' by Kenneth W. Church, is a
brief response to Kilgarriff's (2007) criticism of so-called Googleology. It
discusses the ideal corpus size and quality with the conclusion that the bigger
the corpus the better it is for research, and that the more data there is
economically available for everyone in the field the better it is for the whole
Gregory Greffenstette's ''Estimating the Number of Concepts'' also deals with
quantity. The main aim is to estimate the possible number of concepts that
Natural Language Processing (NLP) systems will have to deal with in the future
when the step toward analyzing multiword expressions and the subsequent
multiword concepts for the purposes of lexicography will be taken. On the basis
of Web queries, Greffenstette comes up with a rough estimate of c. 233 million
two-word combinations that are commonly used on the Web, thus indicating the
scale with which computational lexicography will need to tackle this problem in
The next three articles concentrate on empirical studies of multiword
expressions in three different languages. In ''Identifying Adjectives that
Predict Noun Classes'', David Guthrie and Louise Guthrie develop methods to help
automatically identify the semantic class of head noun in noun phrases on the
basis of preceding adjectives. They base their examination on three
machine-readable English corpora and show that adjectives actually contain
valuable information about the nouns they modify, and this information can be
used in automatic tagging.
Alexander Geyken's ''Statistical Variations of German Support Verb Constructions
in Very Large Corpora'' reports a study on three German light verbs and their
coexisting noun-verb combinations in two different corpora to determine how
important corpus size is for the results of lexicographic analysis. Geyken
concludes that, up to a point, corpus size does indeed matter, but after it
exceeds 500 million tokens there is fairly little new knowledge to be gained --
at least, with respect to the studied constructions.
In their paper ''A Case Study in Word Sketches -- Czech Verb vidět 'see''', Karel
Pala and Pavel Rychlý apply a tool called the Sketch Engine (see Kilgarriff et
al. 2004) to analyze the grammatical information provided by the Czech verb
vidět 'see' and its environment to arrive at word sketches that should show how
the word functions in Czech grammar. The errors found in the automatic analysis
help the authors make suggestions for the improvement of both the tool and the
The last two papers in this section deal with two of Patrick Hanks' (2004, 2007)
developments: the ''Pattern Dictionary of English Verbs'' (PDEV) and the Corpus
Pattern Analysis (CPA). In ''The Lexical Population of Semantic Types in Hank's
PDEV'', Silvie Cinková, Martin Holub and Lenka Smejkalová describe a work in
progress at the Charles University in Prague, where PDEV is constantly being
developed further with the help of CPA technique. The pilot study discussed here
ends with a suggestion that it might be useful to create manually annotated
testing data, in which collocates would be annotated with Semantic Types, since
this would be likely to make PDEV more usable for NLP in the future.
Elisabetta Jezek's and Francesca Frontini's ''From Pattern Dictionary to
Patternbank'' reports a study in which the PDEV approach is applied to Italian
and thus describes the first attempts at creating a Patternbank for Italian. At
the same time Jezek and Frontini show how the general reliability of the PDEV
technique can be improved by extending it to include ''the annotation of verb
patterns onto the corpus instances that instantiate them'' (p. 215), and this
makes it more useful for analyzing phenomena related to the syntax/semantics
interface as well as for various NLP applications.
Part III: Lexical Analysis and Dictionary Writing
In ''Words that Spring to Mind: Idiom, Allusion, and Convention'', Rosamund Moon
investigates the phraseological reality of 'spring to mind' in a corpus study,
in which she compares her observations with dictionary definitions, coming to
the conclusion that it is indeed the usage of phrases that should overrule
dictionary definitions whenever we aim to understand their true meanings.
Sue Atkins' ''The DANTE Database: Its Contribution to English Lexical Research,
and in Particular to Complementing the FrameNet Data'' compares the two databases
mentioned in the title, DANTE and FrameNet, and makes a suggestion about how to
semi-automatically map their semantic analyses together to be a step closer
toward realizing the lexicographer's dream.
Adam Kilgarriff and Pavel Rychlý combine philosophy of language with
computational linguistics in their paper ''Semi-Automatic Dictionary Drafting'',
in which they point out how the Theory of Norms and Exploitation developed by
Hanks links philosophical ideas to concrete data retrieval in corpora. They
present a software solution called Semi-Automatic Dictionary Drafting, which
should solve some of the problems related to so-called Word Sense
Disambiguation, which continues to make the dictionary writer's life difficult.
In ''Lexicography: Science without Theory?'', Paul Bogaards ponders whether there
actually exists a true lexicographic theory or not, and takes the idea further
by questioning whether we even need one. According to him, it is obvious that no
unitary theory has yet come into existence. After all, there is no agreement on
what such a theory should deal with. However, there are various theories that
are useful for lexicography and they should all be utilized and developed
further to help improve the craft of dictionary writing in the future, without
forgetting that pure serendipity also has its place in this development.
Mirosław Bańko's ''The Polish COBUILD and its Influence on Polish Lexicography''
describes the creation process behind the Polish general-purpose dictionary that
he edited in 2000 and that was modeled after the COBUILD English dictionaries.
Although the dictionary did not prove to be a commercial success, its general
influence on Polish lexicography shows how ideas in one country can be
implemented in slightly different domains than they were originally intended for.
In his article ''ARGOT: The Flesh Made Word'', Jonathan Green delves into the
history of the French occupational slang of criminal classes by extending a
paper published earlier in Critical Quarterly. Green shows, in an interesting
account, how crucial various trials and literary works have been for making the
general audience familiar with the vocabulary of this particular form of jargon,
which has now become extinct.
The compilation ends with Michael Rundel's ''Defining Elegance'', where the author
discusses the rationale behind the lexicographer's solutions when the aim is to
create a dictionary that is useful for its ordinary users rather than for
theoreticians or other lexicographers. Although computers have enabled us to
restore and present more information in an easily accessible form than ever
before, many of the basic ideas related to the elegance of dictionary writing
that were already usable at the time of Johnson (1775) are still topical today
-- and most likely will remain to be so in the future as well.
''A Way with Words'' provides an extensive view into both the questions discussed
in modern lexicography and the methods with which answers to these questions are
sought at the moment. The book touches on most of the areas relevant in the
field. At the same time it shows the current trends and the direction in which
research is at present heading. For example, the role of large digital corpora
and tools for analyzing them is enormous. Although some things can still be done
without corpora, full-fledged modern lexicography is simply impossible without
corpus linguistic methods and computers -- but this of course applies to most
other fields of linguistics as well.
As a coursebook, this book is not the most usable one. Although the articles
cover a wide range of topics and are fairly short, their content is dense and
often requires prior understanding of concepts and theoretical ideas developed
earlier. Moreover, the ideas entertained in the book have been discussed more
extensively elsewhere, which means that there are more comprehensive accounts of
these themes available in various other publications. So, the book is clearly
written for scholars who are somewhat familiar with the field and now have a
chance to update their information on its recent advances and future prospects.
As a Festschrift, the book provides a very illustrative insight into the work
and ideas of Patrick Hanks and the contribution he has made to lexicographical
research. Although Hanks did not write a single word in the book, its contents
reflect his original intuitions and make use of his ideas and theorizations in a
way that nicely introduces them to the readers. There is no doubt that this book
pays homage to its dedicatee. The authors also point out how their own ideas got
inspiration from Hanks and recite amusing anecdotes about him, so that after
reading the book one feels as if one personally knew the man on the cover of
This is also the greatest value of the book. By showing how the authors came to
know Patrick Hanks and how the ideas that they discuss relate to his ideas, the
texts turn this book into a valuable view into the development of scientific
ideas. Each paper adds to the mosaic that reflects the sociological reality
behind academic research. This is also why I think that MA students,
postgraduate students and early-career researchers would make an ideal
readership for this book; they would find it illuminating, even fascinating, to
learn how the research community functions and how theories evolve and ideas
develop through communication and -- sometimes unexpected and incidental --
contacts that we make with various other people in the field. This is something
that old-timers are already familiar with, but for newcomers in academia this
information would be worthwhile.
Davidson, D. (1984) Inquiries into Truth and Interpretation. Oxford: Clarendon
Hanks, P. (2004) Corpus Pattern Analysis. In G. Williams & S. Vessier (eds.),
Proceedings of the Eleventh EURALEX international Congress, EURALEX 2004,
Lorient, France, July 6-10, 2004. Lorient: Faculté des Lettres et des Sciences
Humaines, Université de Bretagne Sud, 87-97.
Hanks, P. (2007) Pattern Dictionary of English Verbs (PDEV) -- Project Page.
Online at http://deb.fi.muni.cz/pdev/.
Johnson, S. (1755) A Dictionary of the English Language. London.
Kilgarriff, A. (2007) Googleology is Bad Science. Computational Linguistics
Kilgarriff, A., P. Rychlý, P. Smrž & D. Tugwell (2004) The Sketch Engine. In G.
Williams & S. Vessier (eds.), Proceedings of the Eleventh EURALEX International
Congress, EURALEX 2004, Lorient, France, July 6-10, 2004. Lorient: Faculté des
Lettres et des Sciences Humaines, Université de Bretagne Sud, 105-116. (See also
Minsky, M. (1975) A Framework for Representing Knowledge. In P. Winston (ed.),
The Psychology of Computer Vision. New York: McGraw-Hill, 211-277.
Quine, W. V. O. (1960) Word and Object. Cambridge, MA: MIT Press.
Wittgenstein, L. (1953) Philosophical Investigations. Translated by G. E. M.
Anscombe. Oxford: Blackwell.
ABOUT THE REVIEWER
Esa Penttilä is postdoctoral researcher at the University of Eastern Finland (Department of English Language and Translation). He received his PhD at the University of Joensuu in 2006. His research interests include idioms and idiomaticity, figurative language and metaphors, culture-specific translation, the syntax/semantics interface, and philosophy of language.
Read more issues|LINGUIST home page|Top of issue
Page Updated: 24-Jun-2011
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.