LINGUIST List 13.1670

Thu Jun 13 2002

Review: Corpus Ling: Tognini-Bonelli (2001)

Editor for this issue: Naomi Ogasawara <>

What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at or Terry Langendoen at


  • Anna-Maria De Cesare, Tognini-Bonelli (2001) Corpus Linguistics at Work

    Message 1: Tognini-Bonelli (2001) Corpus Linguistics at Work

    Date: Wed, 12 Jun 2002 13:20:06 -0400 (EDT)
    From: Anna-Maria De Cesare <>
    Subject: Tognini-Bonelli (2001) Corpus Linguistics at Work

    Tognini-Bonelli, Elena (2001) Corpus Linguistics at Work. Benjamins, 224 pp, hardback ISBN 90 272 2276 2, EUR 63.00, 1 58811 061 3, USD 57.00. Book Announcement on Linguist:

    Anna-Maria De Cesare, Universita de Neuchatel (Switzerland)


    The general goal of Elena Tognini-Bonelli's book "Corpus linguistics at work" is, as she states, to explore "the new approach brought about by corpus linguistics and relating it to the fields of language teaching and translation" (p. 1).

    The more specific purpose of Tognini-Bonelli's book is outlined in the conclusion: "this book presents a case, through argument and example, for the establishment of a new discipline within linguistics, and within corpus linguistics" (p. 177). The provisional name given to the new discipline is 'Corpus-driven Linguistics' (CDL). The book, then, can be seen as a defense for this new field, as it contrasts CDL with other disciplines, and in particular with what Tognini-Bonelli calls 'Corpus-based Linguistics'.


    Tognini-Bonelli's book is divided in ten chapters (pp. 1-186), which are followed by a reference section (pp. 187-201), nine appendixes (pp. 203-218) and an index (pp. 219-223). The book is organized as follows: ch. 1 (Introduction), ch. 2 (Corpus at work: language teaching), ch. 3 (Corpus issues), ch. 4 (The Corpus-based approach), chs. 5-9 (The Corpus-driven approach), ch. 10 (Conclusion).

    Ch. 1 Introduction (pp. 1-13)

    This first chapter discusses general problems related to corpus linguistics as a discipline, and it explains the concept of 'corpus', language as a function in context, and the input of new technologies. Tognini-Bonelli distinguishes between 'translation corpus' and 'comparable corpus', and within the translation corpus between 'parallel corpora' and 'free translation corpora' (pp. 6ff.). The end of the chapter outlines the structure of the book.

    Ch. 2 Language teaching (pp. 14-46)

    This fairly lengthy chapter shows corpus linguistics at work. It "plunges directly" in an application where the corpus can be put to work in the context of language teaching (p. 10). On the basis of several examples, ch. 2 demonstrates that traditional reference texts commonly used in teaching (i.e. grammars and dictionaries) do not provide much help guiding students. Rules presented in grammars and definitions offered in dictionaries do not necessarily reflect the evidence of language, which is a problem that corpus linguistics can help redress. By presenting the ways a corpus can be used in class, this chapter presents a new way of teaching that does not focus on words in isolation but rather on 'extended units of meaning', or 'multi-word units'. Tognini-Bonelli claims that this new communicative approach to language allows students both to disambiguate the meaning of two words regarded as synonyms in reference works and to identify the different meanings of a polysemous word. The apparent synonyms 'fickle' and 'flexible', as well as 'largely' and 'broadly' are provided as examples to illustrate this point. Furthermore, Tognini-Bonelli demonstrates how corpus data can help to identify the two meanings of the structure 'all but', and she also reveals the "communicative functions of exceptions" based on 'except that'.

    Ch. 3 Corpus issues (pp. 47-64)

    Chapter 3 adopts a more theoretical perspective and addresses the main issues related to corpus work in general. After a brief section on the way a corpus has been defined in the past, this chapter considers the theoretical stances behind the definitions presented. In particular, the issues of the 'authenticity' of the texts included in the corpus, their 'representativeness', and the 'sampling criteria' used in the selection of the texts are discussed in detail and evaluated (p. 10).

    Ch. 4 The Corpus-based approach (pp. 65-83)

    This chapter presents the 'Corpus-based' approach as a methodology that uses corpus evidence mainly as "a repository of examples to expound, test, or exemplify given theoretical statements" (p. 10). For Tognini-Bonelli, this methodology does not realize the potential of corpus data since it does not allow for the shaping of the descriptive and theoretical statements that should ideally explain the richness of language usage. Three instances of discrepancies between theory and practice are discussed: a) insulation (when the insights derived from corpus data are accepted only as an "ancillary extention to the theoretical statement", p. 182), b) standardisation (when the corpus is "sieved through" pre-existing theoretical parameters before being analyzed, p. 182) and c) instantiation (when corpus evidence "again provides a probabilistic extension to a pre-existing system", but does not affect the system as such, p. 182).

    Ch. 5 The Corpus-driven approach (pp. 84-100)

    Chapters 5 and 6 discuss and exemplify the approach adopted in Tognini-Bonelli's book and in earlier works (as well as in those of J. Sinclair). One of the major assumptions of the Corpus-driven approach is that, unlike in the Corpus-based approach, theoretical statements do not exist prior to corpus observation, but rather derive from the presence and observation of corpus evidence. Consequently, all the theoretical statements directly reflect the evidence provided by the corpus (p. 84). Furthermore, in a Corpus-driven approach, the linguist is committed to the integrity of the data as a whole, and descriptions aim to be comprehensive with respect to corpus evidence (p. 84). In considering one of the major findings of Corpus-driven linguistics, Tognini-Bonelli also demonstrates that the similarity between 'lemma' and 'inflected forms' no longer holds. This claim is illustrated by two examples (the English 'face' and 'facing', pp. 93ff., and the Italian 'saper' and 'sapere',pp. 96ff.).

    Ch. 6 Item and environment (pp. 101-130)

    This chapter presents other issues raised by and concepts related to the Corpus-driven approach adopted in the book. Chapter 6 presents the close relationship, or overlap, between an item's meaning and its environment. Tognini-Bonelli claims that this merging of item and environment becomes clear with the alphabetized concordance. She also discusses and illustrates the concept of an 'extended unit of meaning' on the basis of both English and Italian examples. In particular, she maintains that an extended unit of meaning can be identified by scanning the patterns on the vertical axis of the concordance line. This chapter then discusses three particular issues related to the Corpus-driven approach: semantic prosodies, delexicalisation and ideology.

    Ch. 7 Working with corpora across languages (pp. 131-156)

    This chapter proposes a methodology to work with corpora across languages. Three applied fields are discussed in detail: language teaching, contrastive linguistics and translation. The discussion provided mainly shows how to identify comparable units of meaning in two languages (here English and Italian), through three distinct steps: 1) from formal patterning of L1 to function of L1, 2) identification of a 'prima facie' translation equivalent for each function (function L2), 3) from function in L2 to formal patterning of L2.

    Ch. 8 The contextual theory of meaning (pp. 157-164)

    Although proposed before the advent of computerized corpora, the Contextual Theory of Meaning proposed by J. R. Firth (1890-1960) underlies the Corpus-driven approach adopted by Tognini-Bonelli. Consequently, chapter 8 presents the main points and concepts of Firth's theory. Most significantly, Tognini-Bonelli discusses the notion of 'language events' as "typical, recurrent and repeatedly observable" (Firth 1957: 35). Two central concepts related to 'language events' proposed by Firth and also adopted in Tognini-Bonelli's book are 'collocation' (the "recurrent co-occurrence of words", p. 89) and 'colligation' ("the grammatical patterning in which the node word, as member of its class, is embedded", "the interrelations of the syntactical categories", p. 89).

    Ch. 9 Historical landmarks in meaning (pp. 165-176)

    This chapter gives an historical overview of the concept of 'meaning' over the last century (starting with Breal and including Saussure, Bloomfield, Harris and Chomsky), discussing the frameworks that are "seen as relevant to understanding the theoretical premises of Corpus-driven work" (pp. 11-2).

    Ch. 10 Conclusion (pp. 177-201)

    This concluding chapter is in fact a statement for the establishment of Corpus-Driven Linguistics (CDL) as a new and independent discipline within corpus linguistics. The goals, standpoint, methodology, categories and body of knowledge of this new discipline are briefly presented and discussed. Finally, the end of the chapter provides a summary of the main issues and assumptions of the CDL approach presented in the whole book.


    Tognini-Bonelli's 'Corpus linguistics at work' is a very interesting and inspiring book both for linguists and non linguists (language teachers and translators). The interest of Tognini-Bonelli's book stems from its richness, as she draws upon her experience as both linguist and language teacher as well as her bilingual background. In what follows I will address just some of the many important issues raised in the book.

    First of all, this book is significant in that it presents both the major assumptions and issues of corpus linguistics and the theoretical statements of a new discipline, called Corpus-driven linguistics (which, as she claims, is less than twenty years old; the first corpus-driven study is said to be the Cobuild project directed by Sinclair). The differences between the Corpus-driven approach adopted by Tognini-Bonelli and the alternative approaches, such as the Corpus-based approach, are clearly presented in the book. It should also be noted that the book cannot be seen as an introduction to corpus linguistics. It would best be characterized as a defense for Corpus-driven linguistics and an illustration of the potential of this new discipline. Specifically, Tognini-Bonelli very convincingly presents how, through the advent of computerized corpora and corpora observation, new information is available and new theoretical stances are possible. The advantages of the Corpus-driven approach over the other fields of linguistics (among others, Chomsky's Generative Grammar), and Corpus-driven linguistics in particular, are quite clear both from a theoretical and an applied point of view.

    >From a theoretical perspective, the Corpus-driven approach has changed the way meaning is identified and defined (p. 85). It leads to the conclusion that "we have to abandon the fiction that each word is some kind of independent selection, and accept that the choice patterns of words in text can create new, large and complex units of meaning" (p. 101). This new discipline is therefore a new way of dealing with semantics in general and lexical semantics in particular. Let me point out that D. A. Cruse's work, although not cited in Tognini-Bonelli's book, is very close to the analysis outlined by Tognini-Bonelli. As Cruse states, for example, "The full set of normality relations which a lexical item contracts with all conceivable contexts will be referred to as its 'contextual relations'. We shall say, then, that the meaning of a word is fully reflected in its contextual relations; in fact, we can go further, and say that, for present purposes, the meaning of a word is constitued by its contextual relations." (Cruse 1986:15ff.)

    The central question in semantics of synonymy is another issue addressed by the Corpus-driven approach. For the Corpus-driven approach, synonyms do not exist (p. 34): two apparently similar words (like the English 'broadly' and 'largely' or even the Italian 'saper' and 'sapere') must be understood as different forms which can be distinguished by examining 1) their frequency in a corpus and 2) the context in which they appear (their colligation and collocation). Since a word or an extended unit of meaning is dependent on the context, "it seems that when the linguistic system offers a set of apparently parallel or equivalent expressions, these, if they ever were truly equivalent, do not stay so for long and they tend to find their own very specialized role in a specific domain." (p. 137).

    It is also worth noting another implicit advantage of Corpus-driven linguistics for the description of a language and the construction of a theory of language. By basing the analysis on a (very large) corpus of texts, the Corpus-driven approach solves the difficult problem of accounting for both a language system and human linguistic competence on the basis of intuition and introspection alone. In other words, because the theoretical statements provided by Corpus-driven linguistics are entirely based on the observation and analysis of a corpus of texts, Corpus-driven linguistics overcomes the difficulty of creating and assessing the grammaticality of sentences which are artificially constructed by the linguist in order to provide, confirm, or deny theoretical statements. Since the theory of meaning or language described by Tognini-Bonelli is based entirely on authentic and thus possible evidence, the employment of unattested and impossible examples in constructing a theory must be abandoned. Corpus-driven linguistics can thus be understood as providing a reconciliation between linguistic theory and practice (for a recent discussion of this issue also see Beaugrande 2002).

    >From a more practical point of view, since corpus-driven methodology allows for the disambiguation of meanings that would otherwise be very difficult to distinguish without access to a corpus, it provides important tools not only for language description but also for lexicography and translation. Specifically, the book demonstrates how the information gathered on the basis of the methodology adopted by Corpus-driven linguistics is different from the information available in traditional reference books (dictionaries and grammars). It thus illustrates how to improve the information provided in those works. I find the comparison between the information available in reference works and what can be derived from a corpus analysis to be very convincing. As Tognini-Bonelli herself states "this is where the contribution of corpus-driven work is felt most positively" (p. 22). It should be noted that some interesting results based on the Corpus-driven framework are already available (see for instance the Collins Cobuild dictionary 1995 and grammar 1990 as well as Hunston and Francis 2000).

    Very useful for language teachers are the possible exercises provided as examples in Appendixes 2 and 3. Appendix 2, for instance, is designed to familiarize students with the grammatical and communicative functions of 'all but' using corpus data. It is worth noting that the exercises provided stem from Tognini-Bonelli's own experience as a teacher of English to Italian students. It is an excellent idea to examine the words in context as well as to take into account a multi-word unit. However, understanding and speaking about the criteria necessary for distinguishing two apparent synonyms requires a solid grammatical knowledge that students often do not possess. Asking students to identify the formal realizations of a particular function (p. 29) could therefore be difficult (especially, as Tognini-Bonelli herself notes, at the beginning stages of language acquisition).

    While the book is both interesting and important, it is worth considering some minor weaknesses. First, I find the structure of the book to be somewhat awkward in that it presents illustrations of the Corpus-driven approach as early as chapter 2 before discussing the methodology and concepts of this approach (which is done starting at chapter 5). Second, I find some ideas within chapters (especially in the first chapters) difficult to understand due to the fact that important concepts are sometimes employed before they are actually defined. The concept of 'semantic prosody', for example, is employed as early as p. 19 but only explained on pp. 111ff. In addition, the notation N-1, N-2, N-3 and N+1, N+2, N+3, is explained on p. 42, after it is employed on p. 35. Another difficulty for understanding certain concepts arises from the fact that, occasionally, abbreviations are not explained (see 'LSP', p. 8 and 'SL' and 'TL' p. 132). Some potential readers, in particular non computational linguists, also may not understand some of the technical terminology employed (see for instance the terms 't-score', referred to on p. 19 and used on p. 20, 'tagging' and 'parsers', p. 74). Additionally, in contrast to what is stated on pp. 52ff., not a lot of information is provided about the Italian corpus used in the book (see p. 12). For instance, I would have liked to know more precisely from which sources the texts were drawn. Finally, while perhaps not the choice of the author, I find the placing of the notes at the end of every chapter to be rather unwieldy and disruptive to the flow of the text.

    Despite these minor criticisms, the writing is generally very clear and the main thesis of the book is convincingly presented. I would highly recommend the book to everyone interested in corpus linguistics, (lexical) semantics, and applied linguistics. As mentioned at the beginning of this review, the methodology adopted and the results achieved are very promising and sure to inspire a great deal of future research.


    Beaugrande, R. de (2002), "Descriptive Linguistics at the Millenium: Corpus Data as Authentic Language", in Journal of Language and Linguistics 1, 2, pp. 91-131.

    Collins Cobuild English Language Dictionary (1995, 2nd ed.)

    Collins Cobuild English Grammar (1990)

    Cruse, D.A. (1986), Lexical semantics. Cambridge: Cambridge University Press.

    Firth, J. R. (1957), Papers in linguistics 1934-51. London: Oxford University Press.

    Hunston, S. and G. Francis (2000), Pattern Grammar. A Corpus-driven Approach to the Lexical Grammar of English. Amsterdam and Philadelphia: Benjamins.


    Anna-Maria De Cesare holds a Ph.D. in linguistics from the University of Geneva, Switzerland and is currently Chargee d'enseignement de l'institut d'italien at the University of Neuchatel. Her academic interests mainly include lexical semantics, lexicography, corpus linguistics, and contrastive linguistics. Both in her book "Intensification, modalisation, focalisation: les diffarents effets des adverbes 'proprio', 'davvero' et 'veramente" (in print by Lang Press) and in a recent article "'Davvero' vs. 'veramente': une analyse contrastive A la lumiare des mathodes de la linguistique statistique" (to appear in Vox romanica), she describes commonly used and semantically similar Italian adverbs in light of an analysis of a corpus of 20th century Italian texts.