LINGUIST List 13.1670

Thu Jun 13 2002

Review: Corpus Ling: Tognini-Bonelli (2001)

Editor for this issue: Naomi Ogasawara <>

What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at or Terry Langendoen at


  1. Anna-Maria De Cesare, Tognini-Bonelli (2001) Corpus Linguistics at Work

Message 1: Tognini-Bonelli (2001) Corpus Linguistics at Work

Date: Wed, 12 Jun 2002 13:20:06 -0400 (EDT)
From: Anna-Maria De Cesare <>
Subject: Tognini-Bonelli (2001) Corpus Linguistics at Work

Tognini-Bonelli, Elena (2001) Corpus Linguistics at Work.
Benjamins, 224 pp, hardback ISBN 90 272 2276 2, EUR 63.00,
1 58811 061 3, USD 57.00.
Book Announcement on Linguist:

Anna-Maria De Cesare, Universita de Neuchatel (Switzerland)


The general goal of Elena Tognini-Bonelli's book "Corpus linguistics
at work" is, as she states, to explore "the new approach brought about
by corpus linguistics and relating it to the fields of language
teaching and translation" (p. 1).

The more specific purpose of Tognini-Bonelli's book is outlined in the
conclusion: "this book presents a case, through argument and example,
for the establishment of a new discipline within linguistics, and
within corpus linguistics" (p. 177). The provisional name given to the
new discipline is 'Corpus-driven Linguistics' (CDL). The book, then,
can be seen as a defense for this new field, as it contrasts CDL with
other disciplines, and in particular with what Tognini-Bonelli calls
'Corpus-based Linguistics'.


Tognini-Bonelli's book is divided in ten chapters (pp. 1-186), which
are followed by a reference section (pp. 187-201), nine appendixes
(pp. 203-218) and an index (pp. 219-223). The book is organized as
follows: ch. 1 (Introduction), ch. 2 (Corpus at work: language
teaching), ch. 3 (Corpus issues), ch. 4 (The Corpus-based approach),
chs. 5-9 (The Corpus-driven approach), ch. 10 (Conclusion).

Ch. 1 Introduction (pp. 1-13)

This first chapter discusses general problems related to corpus
linguistics as a discipline, and it explains the concept of 'corpus',
language as a function in context, and the input of new
technologies. Tognini-Bonelli distinguishes between 'translation
corpus' and 'comparable corpus', and within the translation corpus
between 'parallel corpora' and 'free translation corpora'
(pp. 6ff.). The end of the chapter outlines the structure of the book.

Ch. 2 Language teaching (pp. 14-46)

This fairly lengthy chapter shows corpus linguistics at work. It
"plunges directly" in an application where the corpus can be put to
work in the context of language teaching (p. 10). On the basis of
several examples, ch. 2 demonstrates that traditional reference texts
commonly used in teaching (i.e. grammars and dictionaries) do not
provide much help guiding students. Rules presented in grammars and
definitions offered in dictionaries do not necessarily reflect the
evidence of language, which is a problem that corpus linguistics can
help redress. By presenting the ways a corpus can be used in class,
this chapter presents a new way of teaching that does not focus on
words in isolation but rather on 'extended units of meaning', or
'multi-word units'. Tognini-Bonelli claims that this new communicative
approach to language allows students both to disambiguate the meaning
of two words regarded as synonyms in reference works and to identify
the different meanings of a polysemous word. The apparent synonyms
'fickle' and 'flexible', as well as 'largely' and 'broadly' are
provided as examples to illustrate this point. Furthermore,
Tognini-Bonelli demonstrates how corpus data can help to identify the
two meanings of the structure 'all but', and she also reveals the
"communicative functions of exceptions" based on 'except that'.

Ch. 3 Corpus issues (pp. 47-64)

Chapter 3 adopts a more theoretical perspective and addresses the main
issues related to corpus work in general. After a brief section on the
way a corpus has been defined in the past, this chapter considers the
theoretical stances behind the definitions presented. In particular,
the issues of the 'authenticity' of the texts included in the corpus,
their 'representativeness', and the 'sampling criteria' used in the
selection of the texts are discussed in detail and evaluated (p. 10).

Ch. 4 The Corpus-based approach (pp. 65-83)

This chapter presents the 'Corpus-based' approach as a methodology
that uses corpus evidence mainly as "a repository of examples to
expound, test, or exemplify given theoretical statements" (p. 10). For
Tognini-Bonelli, this methodology does not realize the potential of
corpus data since it does not allow for the shaping of the descriptive
and theoretical statements that should ideally explain the richness of
language usage. Three instances of discrepancies between theory and
practice are discussed: a) insulation (when the insights derived from
corpus data are accepted only as an "ancillary extention to the
theoretical statement", p. 182), b) standardisation (when the corpus
is "sieved through" pre-existing theoretical parameters before being
analyzed, p. 182) and c) instantiation (when corpus evidence "again
provides a probabilistic extension to a pre-existing system", but does
not affect the system as such, p. 182).

Ch. 5 The Corpus-driven approach (pp. 84-100)

Chapters 5 and 6 discuss and exemplify the approach adopted in
Tognini-Bonelli's book and in earlier works (as well as in those of
J. Sinclair). One of the major assumptions of the Corpus-driven
approach is that, unlike in the Corpus-based approach, theoretical
statements do not exist prior to corpus observation, but rather derive
from the presence and observation of corpus evidence. Consequently,
all the theoretical statements directly reflect the evidence provided
by the corpus (p. 84). Furthermore, in a Corpus-driven approach, the
linguist is committed to the integrity of the data as a whole, and
descriptions aim to be comprehensive with respect to corpus evidence
(p. 84). In considering one of the major findings of Corpus-driven
linguistics, Tognini-Bonelli also demonstrates that the similarity
between 'lemma' and 'inflected forms' no longer holds. This claim is
illustrated by two examples (the English 'face' and 'facing',
pp. 93ff., and the Italian 'saper' and 'sapere',pp. 96ff.).

Ch. 6 Item and environment (pp. 101-130)

This chapter presents other issues raised by and concepts related to
the Corpus-driven approach adopted in the book. Chapter 6 presents the
close relationship, or overlap, between an item's meaning and its
environment. Tognini-Bonelli claims that this merging of item and
environment becomes clear with the alphabetized concordance. She also
discusses and illustrates the concept of an 'extended unit of meaning'
on the basis of both English and Italian examples. In particular, she
maintains that an extended unit of meaning can be identified by
scanning the patterns on the vertical axis of the concordance
line. This chapter then discusses three particular issues related to
the Corpus-driven approach: semantic prosodies, delexicalisation and

Ch. 7 Working with corpora across languages (pp. 131-156)

This chapter proposes a methodology to work with corpora across
languages. Three applied fields are discussed in detail: language
teaching, contrastive linguistics and translation. The discussion
provided mainly shows how to identify comparable units of meaning in
two languages (here English and Italian), through three distinct
steps: 1) from formal patterning of L1 to function of L1, 2)
identification of a 'prima facie' translation equivalent for each
function (function L2), 3) from function in L2 to formal patterning of

Ch. 8 The contextual theory of meaning (pp. 157-164)

Although proposed before the advent of computerized corpora, the
Contextual Theory of Meaning proposed by J. R. Firth (1890-1960)
underlies the Corpus-driven approach adopted by
Tognini-Bonelli. Consequently, chapter 8 presents the main points and
concepts of Firth's theory. Most significantly, Tognini-Bonelli
discusses the notion of 'language events' as "typical, recurrent and
repeatedly observable" (Firth 1957: 35). Two central concepts related
to 'language events' proposed by Firth and also adopted in
Tognini-Bonelli's book are 'collocation' (the "recurrent co-occurrence
of words", p. 89) and 'colligation' ("the grammatical patterning in
which the node word, as member of its class, is embedded", "the
interrelations of the syntactical categories", p. 89).

Ch. 9 Historical landmarks in meaning (pp. 165-176)

This chapter gives an historical overview of the concept of 'meaning'
over the last century (starting with Breal and including Saussure,
Bloomfield, Harris and Chomsky), discussing the frameworks that are
"seen as relevant to understanding the theoretical premises of
Corpus-driven work" (pp. 11-2).

Ch. 10 Conclusion (pp. 177-201)

This concluding chapter is in fact a statement for the establishment
of Corpus-Driven Linguistics (CDL) as a new and independent discipline
within corpus linguistics. The goals, standpoint, methodology,
categories and body of knowledge of this new discipline are briefly
presented and discussed. Finally, the end of the chapter provides a
summary of the main issues and assumptions of the CDL approach
presented in the whole book.


Tognini-Bonelli's 'Corpus linguistics at work' is a very interesting
and inspiring book both for linguists and non linguists (language
teachers and translators). The interest of Tognini-Bonelli's book
stems from its richness, as she draws upon her experience as both
linguist and language teacher as well as her bilingual background. In
what follows I will address just some of the many important issues
raised in the book.

First of all, this book is significant in that it presents both the
major assumptions and issues of corpus linguistics and the theoretical
statements of a new discipline, called Corpus-driven linguistics
(which, as she claims, is less than twenty years old; the first
corpus-driven study is said to be the Cobuild project directed by
Sinclair). The differences between the Corpus-driven approach adopted
by Tognini-Bonelli and the alternative approaches, such as the
Corpus-based approach, are clearly presented in the book. It should
also be noted that the book cannot be seen as an introduction to
corpus linguistics. It would best be characterized as a defense for
Corpus-driven linguistics and an illustration of the potential of this
new discipline. Specifically, Tognini-Bonelli very convincingly
presents how, through the advent of computerized corpora and corpora
observation, new information is available and new theoretical stances
are possible. The advantages of the Corpus-driven approach over the
other fields of linguistics (among others, Chomsky's Generative
Grammar), and Corpus-driven linguistics in particular, are quite clear
both from a theoretical and an applied point of view.

>From a theoretical perspective, the Corpus-driven approach has
changed the way meaning is identified and defined (p. 85). It leads to
the conclusion that "we have to abandon the fiction that each word is
some kind of independent selection, and accept that the choice
patterns of words in text can create new, large and complex units of
meaning" (p. 101). This new discipline is therefore a new way of
dealing with semantics in general and lexical semantics in
particular. Let me point out that D. A. Cruse's work, although not
cited in Tognini-Bonelli's book, is very close to the analysis
outlined by Tognini-Bonelli. As Cruse states, for example, "The full
set of normality relations which a lexical item contracts with all
conceivable contexts will be referred to as its 'contextual
relations'. We shall say, then, that the meaning of a word is fully
reflected in its contextual relations; in fact, we can go further, and
say that, for present purposes, the meaning of a word is constitued
by its contextual relations." (Cruse 1986:15ff.)

The central question in semantics of synonymy is another issue
addressed by the Corpus-driven approach. For the Corpus-driven
approach, synonyms do not exist (p. 34): two apparently similar words
(like the English 'broadly' and 'largely' or even the Italian 'saper'
and 'sapere') must be understood as different forms which can be
distinguished by examining 1) their frequency in a corpus and 2) the
context in which they appear (their colligation and
collocation). Since a word or an extended unit of meaning is dependent
on the context, "it seems that when the linguistic system offers a set
of apparently parallel or equivalent expressions, these, if they ever
were truly equivalent, do not stay so for long and they tend to find
their own very specialized role in a specific domain." (p. 137).

It is also worth noting another implicit advantage of Corpus-driven
linguistics for the description of a language and the construction of
a theory of language. By basing the analysis on a (very large) corpus
of texts, the Corpus-driven approach solves the difficult problem of
accounting for both a language system and human linguistic competence
on the basis of intuition and introspection alone. In other words,
because the theoretical statements provided by Corpus-driven
linguistics are entirely based on the observation and analysis of a
corpus of texts, Corpus-driven linguistics overcomes the difficulty of
creating and assessing the grammaticality of sentences which are
artificially constructed by the linguist in order to provide, confirm,
or deny theoretical statements. Since the theory of meaning or
language described by Tognini-Bonelli is based entirely on authentic
and thus possible evidence, the employment of unattested and
impossible examples in constructing a theory must be
abandoned. Corpus-driven linguistics can thus be understood as
providing a reconciliation between linguistic theory and practice (for
a recent discussion of this issue also see Beaugrande 2002).

>From a more practical point of view, since corpus-driven methodology
allows for the disambiguation of meanings that would otherwise be very
difficult to distinguish without access to a corpus, it provides
important tools not only for language description but also for
lexicography and translation. Specifically, the book demonstrates how
the information gathered on the basis of the methodology adopted by
Corpus-driven linguistics is different from the information available
in traditional reference books (dictionaries and grammars). It thus
illustrates how to improve the information provided in those works. I
find the comparison between the information available in reference
works and what can be derived from a corpus analysis to be very
convincing. As Tognini-Bonelli herself states "this is where the
contribution of corpus-driven work is felt most positively"
(p. 22). It should be noted that some interesting results based on the
Corpus-driven framework are already available (see for instance the
Collins Cobuild dictionary 1995 and grammar 1990 as well as Hunston
and Francis 2000).

Very useful for language teachers are the possible exercises provided
as examples in Appendixes 2 and 3. Appendix 2, for instance, is
designed to familiarize students with the grammatical and
communicative functions of 'all but' using corpus data. It is worth
noting that the exercises provided stem from Tognini-Bonelli's own
experience as a teacher of English to Italian students. It is an
excellent idea to examine the words in context as well as to take into
account a multi-word unit. However, understanding and speaking about
the criteria necessary for distinguishing two apparent synonyms
requires a solid grammatical knowledge that students often do not
possess. Asking students to identify the formal realizations of a
particular function (p. 29) could therefore be difficult (especially,
as Tognini-Bonelli herself notes, at the beginning stages of language

While the book is both interesting and important, it is worth
considering some minor weaknesses. First, I find the structure of the
book to be somewhat awkward in that it presents illustrations of the
Corpus-driven approach as early as chapter 2 before discussing the
methodology and concepts of this approach (which is done starting at
chapter 5). Second, I find some ideas within chapters (especially in
the first chapters) difficult to understand due to the fact that
important concepts are sometimes employed before they are actually
defined. The concept of 'semantic prosody', for example, is employed
as early as p. 19 but only explained on pp. 111ff. In addition, the
notation N-1, N-2, N-3 and N+1, N+2, N+3, is explained on p. 42, after
it is employed on p. 35. Another difficulty for understanding certain
concepts arises from the fact that, occasionally, abbreviations are
not explained (see 'LSP', p. 8 and 'SL' and 'TL' p. 132). Some
potential readers, in particular non computational linguists, also
may not understand some of the technical terminology employed (see for
instance the terms 't-score', referred to on p. 19 and used on p. 20,
'tagging' and 'parsers', p. 74). Additionally, in contrast to what is
stated on pp. 52ff., not a lot of information is provided about the
Italian corpus used in the book (see p. 12). For instance, I would
have liked to know more precisely from which sources the texts were
drawn. Finally, while perhaps not the choice of the author, I find the
placing of the notes at the end of every chapter to be rather unwieldy
and disruptive to the flow of the text.

Despite these minor criticisms, the writing is generally very clear
and the main thesis of the book is convincingly presented. I would
highly recommend the book to everyone interested in corpus
linguistics, (lexical) semantics, and applied linguistics. As
mentioned at the beginning of this review, the methodology adopted and
the results achieved are very promising and sure to inspire a great
deal of future research.


Beaugrande, R. de (2002), "Descriptive Linguistics at the Millenium:
Corpus Data as Authentic Language", in Journal of Language and
Linguistics 1, 2, pp. 91-131.

Collins Cobuild English Language Dictionary (1995, 2nd ed.)

Collins Cobuild English Grammar (1990)

Cruse, D.A. (1986), Lexical semantics. Cambridge: Cambridge University Press.

Firth, J. R. (1957), Papers in linguistics 1934-51. London: Oxford
University Press.

Hunston, S. and G. Francis (2000), Pattern Grammar. A Corpus-driven
Approach to the Lexical Grammar of English. Amsterdam and
Philadelphia: Benjamins.


Anna-Maria De Cesare holds a Ph.D. in linguistics from the University
of Geneva, Switzerland and is currently Chargee d'enseignement de
l'institut d'italien at the University of Neuchatel. Her academic
interests mainly include lexical semantics, lexicography, corpus
linguistics, and contrastive linguistics. Both in her book
"Intensification, modalisation, focalisation: les diffarents effets
des adverbes 'proprio', 'davvero' et 'veramente" (in print by Lang
Press) and in a recent article "'Davvero' vs. 'veramente': une analyse
contrastive A la lumiare des mathodes de la linguistique statistique"
(to appear in Vox romanica), she describes commonly used and
semantically similar Italian adverbs in light of an analysis of a
corpus of 20th century Italian texts.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue