Review of  Corpus Linguistics and the Description of English

Reviewer: Marlies Gabriele Prinzl
Book Title: Corpus Linguistics and the Description of English
Book Author: Hans Lindquist
Publisher: Edinburgh University Press
Linguistic Field(s): Text/Corpus Linguistics
Discipline of Linguistics
Subject Language(s): English
Issue Number: 21.3562

Author: Hans Lindquist
Title: Corpus Linguistics and the Description of English
Series Title: Edinburgh Textbooks on the English Language
Publisher: Edinburgh University Press
Year: 2009

Marlies Gabriele Prinzl, Centre for Intercultural Studies, University College London


With 'Corpus Linguistics and the Description of English' Hans Lindquist offers
another introductory book to corpus linguistics, but aims it specifically at
''university students of English at intermediate to advanced levels who have a
certain background in grammar and linguistics, but who have not had the
opportunity to use computer corpora to any great extent'' (xvi). He proposes that
the book, especially certain sections of it, may also be of interest to students
of literature. 'Corpus Linguistics and the Description of English' is comprised
of of ten chapters. Chapters 1-5 cover the basics, introducing corpus
linguistics as a discipline, discussing its methods and explaining key terms,
chapters 6-10 delve into more specific and different subject matters, ranging
from corpus-based metaphor studies to the applications of corpora in
sociolinguistics. Readers new to corpus linguistics would therefore benefit from
reading the first section of the book, but might opt to peruse only chapters
relevant to their studies from the second part. That said, chapters 6-10 provide
a valuable overview of the different possibilities within corpus linguistics for
anyone new to the field. All chapters are set up in an identical fashion and
include, in addition to a discussion of the topic covered, a chapter summary,
study questions, suggestions for further reading as well as online corpus
exercises on the book's supplementary webpage.
The first chapter introduces corpus linguistics as a field. It is established
from the beginning that the name does not so much indicate what is being
studied, but the methodology that is being used. Lindquist however also notes
that ''it cannot be denied that corpus linguistics is also frequently associated
with a certain outlook in language'' (p. 1). Furthermore, the author emphasises
that the book's focus is on transmitting the ''joy and fascination that lie in
the description of the English language'' (p. 1). As other introductory books –
e.g. Kennedy’s 'An Introduction to Corpus Linguistics' (1998) or Meyer’s
'English Corpus Linguistics: An Introduction' (2002) - , Lindquist commences
with a historical overview of the field and lists the first corpora. All the
essential basics – concordances, frequency, the distinction between corpus-based
and corpus driven, different corpora types (spoken, general, specialised,
historical, parallel) – are covered, with the use of dictionaries, text archives
and web as corpora also being mentioned.
In the second preparatory chapter, ''Counting, calculating and annotating,''
Lindquist provides further bases for corpus research, delineating quantitative
and qualitative methods. He raises the indispensable question of what makes a
word. A significant part of the chapter is devoted to managing and comparing
frequency data by using statistical methods such as significance testing and
measurement of strength of lexical association. Lindquist notes that the extent
of use of such methods in the field varies greatly as language scholars have not
traditionally received training in statistics and that some researchers (Gries
2006) have called for ''greater sophistication'' (p. 37) being needed in this
area. Finally, the chapter also introduces students to corpus annotation.
Chapter 3, ''Looking for Lexis,'' discusses the uses of corpora for
lexicographers. It explores the different meanings of words through the example
of 'squeeze,' concluding that ''the meaning of a word can only be ascertained by
looking at the contexts in which it occurs'' (p. 57). This observation subtly
hints at that ''certain outlook in language'' associated with corpus linguistics,
which Lindquist already alluded to in chapter 1. As part of this discussion,
several more important terms are introduced ('collocation', 'colligation',
'semantic preference,' and 'semantic prosody'), all of which are carefully
defined and illustrated by examples. Lindquist moreover does not omit to mention
that there is controversy about some of the terms. The chapter also considers
lexical changes over time, a topic that is picked up again in chapter 9.
Finally, an account of how corpus techniques can be used not only to study
language as a whole but to examine how it is used by a specific writer or within
a single work.
As the title ''Checking collocations and colligations'' already indicates, chapter
4 more thoroughly explores two terms introduced in the previous section,
focusing predominantly on collocations. Lindquist commences with a discussion on
native-like fluency, stating that ''[t]he ability to combine words in the right
way is the key to native-like fluency'' (p. 71). With this, collocations and
their challenge to Chomskyan (generative) linguistics are put forward.
Lindquist's treatment on ''collocation'' goes far back in time as he attributes
the first usage to the educationalist H.E. Palmer in 1933, who defined the term
as ''a succession of two or more words that must be learnt as an integral whole
and not pieced together from its component parts'' (title page). The author then
proceeds, more familiarly, with Firth's usage of the term collocation, which
emphasises how the meaning of individual words is influenced by other words
frequently occurring with it, offering a second definition: ''The
more-frequent-than-average co-occurrence of two lexical items within five words
of the texts'' (Krishnamurthy 2004: xiii). Lindquist observes thus that Palmer's
and Firth's definitions point to different concepts, but that linguists often
use ''collocation'' without making a distinction.
The final general chapter, ''Finding Phrases,'' continues with language patterns
by focusing on phrases, that is, ''more or less fixed strings which are used over
and over again'' (p. 91). After naming some of the many terms used to refer to
the phenomenon of phrases, Lindquist briefly explains John Sinclair's open
choice principle and idiom principle (1991: 100), emphasising the view of
language that has emerged through corpora: that there is a significant amount of
linguistic repetitiveness and that language users will frequently rely on
conventionalised utterances even when other possibilities exist. Most of the
chapter is devoted to examining examples of idioms and recurrent phrases,
allowing readers to gain insight into how and what kind of corpus research can
be done. Lindquist uses both more established methods (querying different types
of corpora for complete as well as incomplete phrase units and n-grams) as well
as an emerging one (using Google to search for country-specific variants of
''storm in a teacup''), with the latter serving as an introduction to a method
explored more thoroughly in chapter 10.
As the first of the more specialised chapters, chapter 6, ''Metaphor and
Metonymy,'' commences with a mention of Lakoff and Johnson's influential book
'Metaphors We Live By,' which has motivated increased interest in metaphor since
its 1980 publication. Lindquist provides definitions of metaphor and the related
concepts of simile and metonymy, but the chapter’s focus is really on the first
term. Three different procedures for investigating metaphors are presented
(starting with the source domain, starting with the target domain, starting from
a manual analysis), providing students a good understanding of options in
corpus-based metaphor studies.
Chapter 7 illustrates the possibilities that corpora offer for studying grammar
and presents different sample studies – mostly diachronic in nature and some
comparing American and British usages – on pronouns, get-passives and so forth.
Although most of these studies were done by other researchers, Lindquist also
replicates them and provides helpful step-by-step instructions as well as
critical discussion of differences in results.
The next chapter, ''Male and Female,'' investigates the application of corpora in
sociolinguistics, specifically in relation to gender-specific differences in
language. The usefulness of corpus metadata such as a speaker's social class,
educational level and age is highlighted. Lindquist immediately also notes that
the availability of such data is lacking in most corpora and that possibilities
for sociolinguistic research are therefore still quite limited. The chapter then
looks at a number of different studies investigating gender both in terms of how
men and women talk and are talked about, focusing, as in the previous section,
on diachrony.
After many examples exploring language change throughout corpus Linguistics and
the description of English, chapter 9 is exclusively dedicated to the topic. It
commences with the essential explanation of synchronic and diachronic
perspectives. The focus is, of course, on the latter and a distinction is made
between the two major ways to study change in language through corpora: the
study of 'change in real time' and 'change in apparent time'. The difficulty of
identifying causes of language change -- whether they are internal or external
-- is considered, and plenty more sample studies are discussed. The most notable
one of these is perhaps the study presented in section 9.4, which, instead of
relying on modern and historic corpora, uses the online Oxford English
Dictionary (OED) as a source for data.
The final chapter ventures into territory not treated in most older,
introductory books on the subject: the World Wide Web. It is an area that has
only recently started to gain traction, but one, as Lindquist notes, that fills
certain gaps as for some linguistic research ''standard corpora, even if they
contain 100 million words or more, do not provide enough data'' (p. 187). A
useful distinction is made between ''web as corpus'' (using searching engines to
trawl the web as a corpus) and ''web for corpus'' (using the web as a resource to
create a corpus). The chapter covers ways for using the web in corpus
linguistics and includes sample studies such as Mair's 2007 research on
preposition use in different regional varieties of English. While the advantages
of the web for corpus linguistics (text types not found elsewhere, quantity of
results, et cetera) are pointed out, Lindquist also considers drawbacks and
issues (replication, biased random sampling, lack of linguistic annotation),
concluding that ''an important part of corpus linguistics in the future will be
web-based in one way or another'' (p. 205).


'Corpus Linguistics and the Description of English' provides an introduction to
the subject that is highly accessible for university students of English at
different levels. The book meets the goals it sets for itself and is very much a
hands-on guide with a multitude of sample studies and clear step-by-step
instructions. Exercises in every chapter allow readers to check their
understanding of concepts introduced and provide them with the opportunity to
actually query corpora themselves. In terms of content, the book – a slim volume
of no more than 219 pages – manages to be surprisingly comprehensive, presenting
a wide range of topics, including some options (OED as corpus, the web as
corpus) that make it an updated introduction to a still evolving field.
Commendably, linguistic as well as literary applications of specific methods are
discussed. On occasion a more thorough exploration of topics would have been
useful. For example, the chapter entitled “Metaphor and Metonymy” currently only
serves readers interested in the former rhetorical device as not even the
‘Further Reading’ section includes any recommendations for students wanting to
find out more about corpus-based studies of metonymy. However, there is little
else to criticise and the suggestions following are more of a wish list: A
glossary would be a welcome addition for future editions as students new to the
field would surely find a checklist for all the specialist terms very helpful –
many terms are introduced in 'Corpus Linguistics and the Description of English'
and this can quickly feel overwhelming. A key to exercises should also always be
included, even when the tasks set are as straightforward as in this volume. The
companion website of the book
( is already a
wonderful supplement to the book, but more thorough use could be made of it:
All web-based resources mentioned by Lindquist should be listed there, including
the currently only vaguely referenced online tutorials for statistics. With
print editions, listings of online resources can be problematic as links quickly
become outdated or inactive, however, the supplementary website for this book
means that it should be fairly easy to keep recommendations current.
All things considered, Lindquist’s 'Corpus Linguistics and the Description of
English' is an excellent book for anyone wishing to become acquainted with
corpus linguistics and its wide range of applications. As it can be read either
from cover to cover or perused selectively, it is suitable for many types of
readers. Without doubt, the book will be appreciated by individuals with little
or even no background in the subject and, because succeeds at transmitting that
“joy and fascination” (p. 1) and provides plenty of ideas for new projects, it
is bound to inspire students to further explore the field in exactly the way
that suits them best.


