Review of  Words and Phrases: Corpus Studies of Lexical Semantics

Reviewer: Mayumi Masuko
Book Title: Words and Phrases: Corpus Studies of Lexical Semantics
Book Author: Michael Stubbs
Publisher: Wiley-Blackwell
Linguistic Field(s): Computational Linguistics
Issue Number: 13.972

Stubbs, Michael (2001) Words and Phrases: Corpus Studies of Lexical
Semantics. Blackwell Publishers, xix+267pp, paperback ISBN
0-631-20833-X, USD 39.95 / GBP 16.99.
Mayumi Masuko, Waseda University

As the title suggests, this book focuses on the meanings of units that
are larger than individual words. Drawing upon publicly available
corpora, Stubbs tries to explicate which units can recur and which
cannot and what such recurring expressions mean.

The book is divided into three parts. Two chapters comprise, Part I:
Introduction. Chapter 1, ''Words in Use: Introductory Examples'',
introduces the basis of the author's discussion. Stubbs uses 'text' and
'discourse' interchangeably, and they cover ''naturally occurring,
connected, spoken or written language, which has occurred in some real
context, independently of the linguist'' (p.5). That is, he uses data
from corpora as examples and supplements them with invented examples if
that is absolutely necessary. Many utterances are indirect, so the
hearer has to infer what the speaker has intended. Stubbs emphasizes
that although many of these inferences rely on social convention, some
make use of linguistic convention.

Chapter 2, ''Words, Phrases and Meanings: Basic Concepts'', defines key
words. 'Phrase' refers to a string of words, and 'collocation' means
lexical relation between co-occurring words (i.e. a phrase). ''Corpus is
a collection of texts'' (p.25); 'text' presumably is used in the same
sense as in Chapter 1. In other words, a corpus exemplifies 'attested
language'. 'Word forms' occur in actual texts. 'Lemmas' (or lexemes),
on the other hand, are abstract and a list of lemmas is usually used as
a representation of a vocabulary: a dictionary lists lemmas.
'Collocation' simply refers to co-occurring words, and corpus linguists
are interested in frequent co-occurrence. Co-occurrence here, however,
does not necessarily mean words have to occur next to each other. ''A
'span' is the number of word-forms, before and/or after the node (e.g.
4:4, 0:3), within which collocates are studied'' (p.29). A span of 3:3
or 4:4 is widely used by corpus linguists. Sometimes, a unit longer
than a single word is listed in a dictionary if the meaning is not
predictable from individual word-forms (e.g. nuclear family).
Similarly, there are cases where a meaning of a word-form cannot be
determined in the absence of its collocates: e.g. 'heavy' has different
meanings in 'a heavy smoker' and ' a heavy weather'. In addition to
these lexical relations, reference and denotation are essential as in
any other discussion of meaning. 'Reference' is the relation between a
linguistic expression and a particular object that it refers to;
'denotation', on the other hand, signifies the referential range: what
may be referred to by a given expression. 'Connotation' is another
important concept, and often paired with denotation. It conveys the
speaker's feeling or attitude towards the object, and may be called
'emotive meaning' (cf. Lyons 1977). The vocabulary of a given language
can be regarded as sets of words, where words in the same set, or
'semantic field', share some aspect of meaning.

Part II: Case Studies consists of six chapters. The first two, Chapters
3 and 4, examine phrases. They are idiomatic expressions but usually
not idioms. Concordance (or Key Words in Context) is a simple tool for
processing corpora, which can display all occurrences of a given word
in a text with surrounding words. Stubbs henceforth uses concordance
data, employing Sinclair's (1991) four types of co-occurrence
relations: collocation, colligation, semantic preference, and discourse
prosody. Chapter 3 mainly explains key words and the method of his
analyses. 'Collocation' was introduced in Chapter 2. ''Colligation is
the relation between a pair of grammatical categories'' (p.65).
'Semantic preference' is the relation between a word-form or lemma and
words in the same semantic field. 'Discourse prosodies express speaker
attitude' (p.65; cf. Lyons' (1977) 'evaluative meaning'). In the rest
of Chapter 4, Stubbs examines the Cobuild collocations data-base. Six
expressions - 'resemblance', 'reckless', 'backdrop', 'doses', 'undergo'
and 'chopped' - are chosen to illustrate the fact that collocates
frequently co-occur in certain (almost fixed) combinations and that the
collocates share the same discourse prosody.

Chapters 5 and 6 consider a larger unit, texts. Antonyms and synonyms
often create text cohesion, though it should be noted that what counts
as an antonym of a given word may depend on the context. Similarly,
discourse prosody is context-dependent: the same expression may convey
a favourable or unfavourable connotation depending on which other words
co-occur with it. Chapter 5 analyses a short story, 'Eveline' by Joyce
(1914), using Youmans' (1991) software and shows that the type-token
ratio confirms literary critics' segmentation of the story.

Chapters 7 and 8 examine a perhaps more controversial topic: cultural
significance of words and phrases. One of the cases Stubbs considers in
Chapter 7 is a tripartite set, 'ethnic', 'racial' and 'tribal', and
argues that (a) all of them share the connotation of 'violence'; (b)
'ethnic' is academic, whilst 'racial' may be bureaucratic; and (c) each
tends to be used to refer to a certain group: e.g. 'African tribes' and
'ethnic groups in the former Yugoslavia'. Chapter 8 presents analyses
of loan words. Stubbs notes that native speakers often are not aware of
historical changes in the meaning(s) of an individual word and have
wrong ideas about etymology.

The final section, Part III: Implications, discusses fundamental issues
in corpus linguistics and philosophical issues in linguistics at large.
Chapter 9, ''Words, Phrases and Connotations: On Lexico-grammar and
Evaluative Language'', stresses the importance of connotations in
(lexical) semantics. Although connotations may appear personal, they
are often shared among native speakers. This suggests that non-native
speakers need to be aware of and master this aspect of meaning.
Unfortunately, however, they are often not included in dictionary
definitions. Stubbs uses examples to show that some verbs share the
same discourse prosody and convey the point of view. One case involves
three verbs 'accost' 'lurk' and 'loiter'. All three have negative
connotations and are used when making accusations or complaints about
other people's actions. These verbs appear in different typical
syntactic structures, however. 'Accost' often is used in the passive,
whereas 'lurk' is not. Such information, however, is not usually
included in dictionaries. In Chapter 10, ''Data and Dualisms: On Corpus
Methods and Pluralist Models'', Stubbs rejects monism and adopts a
pluralism. This is different from the two-way distinctions proposed by
Saussure ('langue' and 'parole') and by Chomsky ('competence' and
'performance'). The main thrust of his argument is that linguistic
theory must account for (a) the linguistic behaviour of an individual
speaker, (b) linguistic knowledge, or ''the 'mental lexicon''' (p.232) of
a native speaker and (c) language as a system. This he argues is not
much different from the four-way distinction proposed by Hymes (1972).

Stubbs' main argument is '[i]t makes little sense to describe the
meaning of individual words in isolation, since words are co-selected
with other words, and meanings are distributed across larger units'
(p.100). This is not new. It is practically the same as what Frege
claimed in 1884, which is known as 'context principle': 'Only in the
context of a sentence does a word stand for anything' (Dummett: 192).
In his attempt to prove this, Stubbs extensively uses corpora. This is
because he thinks analysing publicly available data in replaceable
methods is what linguistics requires and this in turn is because he
believes linguistics to be an empirical science. This does not mean he
rejects native speakers' intuitions and invented data as completely
unreliable: they are reliable in some areas and not in others (p.72). I
have always held a similar view of meaning, so I may be too biased to
judge objectively. Nevertheless, I should think that Stubbs provides
sufficient evidence to show his claim is valid.

There are two aspects of the book that I find particularly appealing.
The first is Stubbs' attempt to explain why an expected collocation
does not actually occur very often. One such case is a perhaps
unexpected non-co-occurrence of 'kick' and 'foot': as the use of the
former normally implies the use of the latter, they do not co-occur
very often. Another involves differences in collocation. Verbs such as
'bump' and 'smash' refer to the same action but they differ in
connotations. As one of the main claims of the book is that most work
in semantics/pragmatics ignores or pays little attention to
connotations, this is rather a neat way to highlight their importance.

The other is analyses of cultural significance of certain expressions
discussed in Chapters 7 and 8. Chapter 7 illustrates Stubbs' point
about the importance of the context, here perhaps more largely
construed than usual, with expressions such as 'ethnic' vs. 'racial',
'care', 'proper', etc. Chapter 8 discusses loanwords from German and
points out that native speakers' conception of 'proper' language use
might be wrong. The chapter functions also as a neat illustration of
historical change in meanings across languages.

This book contains ample data drawn from publicly available corpora and
provides a convincing case for the author's main claim that context
plays an essential role in determining the meaning of words or phrases.
The author's style renders his arguments comprehensible. I have,
however, a couple of quibbles.

The first is the lack of precise definitions of some of the key
technical terms. I am not sure whom the target audience would be, but
from the easy-to-follow style of the book, I should guess this might be
intended as an introductory textbook for corpus semantics. If so, I
would have been happier to see key words/phrases more clearly and
explicitly defined. I shall just give two examples. One is 'discourse
prosody' which is ''descriptor of speaker attitude and discourse
function'' (p.88). I can understand what this intuitively means, but it
would have been helpful to provide a fuller explanation because the use
of 'prosody' may suggest this is limited to phonological issues when it
is not. Another is ''inter-collocations'' whose definition I cannot find
in the book. From the discussion of the phrase 'roam the streets'
(pp.203-5), I would guess that this probably means the collocation of a
phrase consisting of more than one word. It is not clear to me how this
is to be obtained or computed, for 'roam' on its own may have a
positive or a negative connotation whilst 'roam the streets', according
to Stubbs, is ''almost always negative'' (p.203). His argument seems to
be that this is because the phrase 'the streets' is predominantly
negative. Some of the examples he gives for this argument do not seem
to be wholly convincing (e.g. do the negative connotations of ''visions
of rubbish piled high in the streets'' arise from 'the streets' as
Stubbs argues to be the case (pp.204-5) or from 'rubbish'?). And does a
word with negative connotations in a phrase always make the phrase as a
whole have negative connotations?

The second is perhaps inevitable, but his discussion of the cultural
significance of linguistic expressions is cast in a predominantly
British context. This in itself is not a bad thing, but makes it
difficult for some readers to appreciate some arguments. In his
discussion of the cultural significance of 'care' in Chapter 7, Stubbs
cites an utterance made by ''Dame Edna Everage (Barry Humphries)'', whom
he uses as an example of ''parodies of psycho-babbles by social
satirists''. Perhaps this is enough, but what this signifies I would
have thought might be only grasped by those with the knowledge of (some
areas of) the British popular culture in the 1980s/1990s.

Such minor issues aside, this book presents the author's arguments
fairly convincingly in a style accessible to undergraduate students. I
recommend this book also to postgraduate students of
semantics/pragmatics who may have a narrower conception of what
'meaning' means. Putting on my language teacher's hat, I would like to
see a dictionary, or still better, an on-line interface, which EFL
students could use to find out if the words they put together really
collocate with one another or not. As Stubbs rightly points out, ''many
connotations for which there is strong corpus evidence are not recorded
in dictionaries'' (p.198) and currently available ''[d]ictionaries have
no systematic way of relating words which have shared connotations''
(p.203). One of Stubbs' main claims is that connotations are a central
part of meaning as ''the whole point of utterance may be to express the
speaker's attitude, evaluation and point of view'' (p.198). If this
appears tenable, and he provides ample evidence for it, then it is
essential for EFL learners to come to grips with connotations.

Dummett, M. (1981) Frege: Philosophy of Language 2nd ed., Duckworth.

Frege, G. (1884) Die Grundlagen der Arithmetik: eine logisch-
mathematische Untersuchung ueber den Begriff der Zahl, Breslau.

Hymes, D. (1972) ''On communicative competence''. In J. Pride and J.
Holmes (eds.) Sociolinguistics, Penguin, pp. 269-293.

Lyons, J. (1977) Semantics, 2 Vols., Cambridge University Press.

Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford University

Youmans, G. (1991) ''A new tool for discourse analysis: the vocabulary
management profile'', Language 67:4, pp.763-789.
Mayumi Masuko did her postgraduate studies at the University of
Cambridge, where she received an MPhil and a PhD in linguistics. She is
an Associate Professor of English at Waseda University, where she
teaches English and linguistics. Her main research interest lies in the
interaction between semantics (broadly conceived) and morphosyntax.

