LINGUIST List 13.972

Mon Apr 8 2002

Review: Corpus Ling/Lexicography: Stubbs (2001)

Editor for this issue: Terence Langendoen <terrylinguistlist.org>


What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at siminlinguistlist.org or Terry Langendoen at terrylinguistlist.org.


Directory

  • Mayumi Masuko, Words and Phrases: Corpus Studies of Lexical Semantics

    Message 1: Words and Phrases: Corpus Studies of Lexical Semantics

    Date: Mon, 08 Apr 2002 16:34:36 +0900
    From: Mayumi Masuko <mayumimn.waseda.ac.jp>
    Subject: Words and Phrases: Corpus Studies of Lexical Semantics


    Stubbs, Michael (2001) Words and Phrases: Corpus Studies of Lexical Semantics. Blackwell Publishers, xix+267pp, paperback ISBN 0-631-20833-X, USD 39.95 / GBP 16.99. Announced in http://linguistlist.org/issues/13/13-436.html

    Mayumi Masuko, Waseda University

    OVERVIEW As the title suggests, this book focuses on the meanings of units that are larger than individual words. Drawing upon publicly available corpora, Stubbs tries to explicate which units can recur and which cannot and what such recurring expressions mean.

    The book is divided into three parts. Two chapters comprise, Part I: Introduction. Chapter 1, "Words in Use: Introductory Examples", introduces the basis of the author's discussion. Stubbs uses 'text' and 'discourse' interchangeably, and they cover "naturally occurring, connected, spoken or written language, which has occurred in some real context, independently of the linguist" (p.5). That is, he uses data from corpora as examples and supplements them with invented examples if that is absolutely necessary. Many utterances are indirect, so the hearer has to infer what the speaker has intended. Stubbs emphasizes that although many of these inferences rely on social convention, some make use of linguistic convention.

    Chapter 2, "Words, Phrases and Meanings: Basic Concepts", defines key words. 'Phrase' refers to a string of words, and 'collocation' means lexical relation between co-occurring words (i.e. a phrase). "Corpus is a collection of texts" (p.25); 'text' presumably is used in the same sense as in Chapter 1. In other words, a corpus exemplifies 'attested language'. 'Word forms' occur in actual texts. 'Lemmas' (or lexemes), on the other hand, are abstract and a list of lemmas is usually used as a representation of a vocabulary: a dictionary lists lemmas. 'Collocation' simply refers to co-occurring words, and corpus linguists are interested in frequent co-occurrence. Co-occurrence here, however, does not necessarily mean words have to occur next to each other. "A 'span' is the number of word-forms, before and/or after the node (e.g. 4:4, 0:3), within which collocates are studied" (p.29). A span of 3:3 or 4:4 is widely used by corpus linguists. Sometimes, a unit longer than a single word is listed in a dictionary if the meaning is not predictable from individual word-forms (e.g. nuclear family). Similarly, there are cases where a meaning of a word-form cannot be determined in the absence of its collocates: e.g. 'heavy' has different meanings in 'a heavy smoker' and ' a heavy weather'. In addition to these lexical relations, reference and denotation are essential as in any other discussion of meaning. 'Reference' is the relation between a linguistic expression and a particular object that it refers to; 'denotation', on the other hand, signifies the referential range: what may be referred to by a given expression. 'Connotation' is another important concept, and often paired with denotation. It conveys the speaker's feeling or attitude towards the object, and may be called 'emotive meaning' (cf. Lyons 1977). The vocabulary of a given language can be regarded as sets of words, where words in the same set, or 'semantic field', share some aspect of meaning.

    Part II: Case Studies consists of six chapters. The first two, Chapters 3 and 4, examine phrases. They are idiomatic expressions but usually not idioms. Concordance (or Key Words in Context) is a simple tool for processing corpora, which can display all occurrences of a given word in a text with surrounding words. Stubbs henceforth uses concordance data, employing Sinclair's (1991) four types of co-occurrence relations: collocation, colligation, semantic preference, and discourse prosody. Chapter 3 mainly explains key words and the method of his analyses. 'Collocation' was introduced in Chapter 2. "Colligation is the relation between a pair of grammatical categories" (p.65). 'Semantic preference' is the relation between a word-form or lemma and words in the same semantic field. 'Discourse prosodies express speaker attitude' (p.65; cf. Lyons' (1977) 'evaluative meaning'). In the rest of Chapter 4, Stubbs examines the Cobuild collocations data-base. Six expressions - 'resemblance', 'reckless', 'backdrop', 'doses', 'undergo' and 'chopped' - are chosen to illustrate the fact that collocates frequently co-occur in certain (almost fixed) combinations and that the collocates share the same discourse prosody.

    Chapters 5 and 6 consider a larger unit, texts. Antonyms and synonyms often create text cohesion, though it should be noted that what counts as an antonym of a given word may depend on the context. Similarly, discourse prosody is context-dependent: the same expression may convey a favourable or unfavourable connotation depending on which other words co-occur with it. Chapter 5 analyses a short story, 'Eveline' by Joyce (1914), using Youmans' (1991) software and shows that the type-token ratio confirms literary critics' segmentation of the story.

    Chapters 7 and 8 examine a perhaps more controversial topic: cultural significance of words and phrases. One of the cases Stubbs considers in Chapter 7 is a tripartite set, 'ethnic', 'racial' and 'tribal', and argues that (a) all of them share the connotation of 'violence'; (b) 'ethnic' is academic, whilst 'racial' may be bureaucratic; and (c) each tends to be used to refer to a certain group: e.g. 'African tribes' and 'ethnic groups in the former Yugoslavia'. Chapter 8 presents analyses of loan words. Stubbs notes that native speakers often are not aware of historical changes in the meaning(s) of an individual word and have wrong ideas about etymology.

    The final section, Part III: Implications, discusses fundamental issues in corpus linguistics and philosophical issues in linguistics at large. Chapter 9, "Words, Phrases and Connotations: On Lexico-grammar and Evaluative Language", stresses the importance of connotations in (lexical) semantics. Although connotations may appear personal, they are often shared among native speakers. This suggests that non-native speakers need to be aware of and master this aspect of meaning. Unfortunately, however, they are often not included in dictionary definitions. Stubbs uses examples to show that some verbs share the same discourse prosody and convey the point of view. One case involves three verbs 'accost' 'lurk' and 'loiter'. All three have negative connotations and are used when making accusations or complaints about other people's actions. These verbs appear in different typical syntactic structures, however. 'Accost' often is used in the passive, whereas 'lurk' is not. Such information, however, is not usually included in dictionaries. In Chapter 10, "Data and Dualisms: On Corpus Methods and Pluralist Models", Stubbs rejects monism and adopts a pluralism. This is different from the two-way distinctions proposed by Saussure ('langue' and 'parole') and by Chomsky ('competence' and 'performance'). The main thrust of his argument is that linguistic theory must account for (a) the linguistic behaviour of an individual speaker, (b) linguistic knowledge, or "the 'mental lexicon'" (p.232) of a native speaker and (c) language as a system. This he argues is not much different from the four-way distinction proposed by Hymes (1972).

    CRITICAL EVALUATION Stubbs' main argument is '[i]t makes little sense to describe the meaning of individual words in isolation, since words are co-selected with other words, and meanings are distributed across larger units' (p.100). This is not new. It is practically the same as what Frege claimed in 1884, which is known as 'context principle': 'Only in the context of a sentence does a word stand for anything' (Dummett: 192). In his attempt to prove this, Stubbs extensively uses corpora. This is because he thinks analysing publicly available data in replaceable methods is what linguistics requires and this in turn is because he believes linguistics to be an empirical science. This does not mean he rejects native speakers' intuitions and invented data as completely unreliable: they are reliable in some areas and not in others (p.72). I have always held a similar view of meaning, so I may be too biased to judge objectively. Nevertheless, I should think that Stubbs provides sufficient evidence to show his claim is valid.

    There are two aspects of the book that I find particularly appealing. The first is Stubbs' attempt to explain why an expected collocation does not actually occur very often. One such case is a perhaps unexpected non-co-occurrence of 'kick' and 'foot': as the use of the former normally implies the use of the latter, they do not co-occur very often. Another involves differences in collocation. Verbs such as 'bump' and 'smash' refer to the same action but they differ in connotations. As one of the main claims of the book is that most work in semantics/pragmatics ignores or pays little attention to connotations, this is rather a neat way to highlight their importance.

    The other is analyses of cultural significance of certain expressions discussed in Chapters 7 and 8. Chapter 7 illustrates Stubbs' point about the importance of the context, here perhaps more largely construed than usual, with expressions such as 'ethnic' vs. 'racial', 'care', 'proper', etc. Chapter 8 discusses loanwords from German and points out that native speakers' conception of 'proper' language use might be wrong. The chapter functions also as a neat illustration of historical change in meanings across languages.

    This book contains ample data drawn from publicly available corpora and provides a convincing case for the author's main claim that context plays an essential role in determining the meaning of words or phrases. The author's style renders his arguments comprehensible. I have, however, a couple of quibbles.

    The first is the lack of precise definitions of some of the key technical terms. I am not sure whom the target audience would be, but from the easy-to-follow style of the book, I should guess this might be intended as an introductory textbook for corpus semantics. If so, I would have been happier to see key words/phrases more clearly and explicitly defined. I shall just give two examples. One is 'discourse prosody' which is "descriptor of speaker attitude and discourse function" (p.88). I can understand what this intuitively means, but it would have been helpful to provide a fuller explanation because the use of 'prosody' may suggest this is limited to phonological issues when it is not. Another is "inter-collocations" whose definition I cannot find in the book. From the discussion of the phrase 'roam the streets' (pp.203-5), I would guess that this probably means the collocation of a phrase consisting of more than one word. It is not clear to me how this is to be obtained or computed, for 'roam' on its own may have a positive or a negative connotation whilst 'roam the streets', according to Stubbs, is "almost always negative" (p.203). His argument seems to be that this is because the phrase 'the streets' is predominantly negative. Some of the examples he gives for this argument do not seem to be wholly convincing (e.g. do the negative connotations of "visions of rubbish piled high in the streets" arise from 'the streets' as Stubbs argues to be the case (pp.204-5) or from 'rubbish'?). And does a word with negative connotations in a phrase always make the phrase as a whole have negative connotations?

    The second is perhaps inevitable, but his discussion of the cultural significance of linguistic expressions is cast in a predominantly British context. This in itself is not a bad thing, but makes it difficult for some readers to appreciate some arguments. In his discussion of the cultural significance of 'care' in Chapter 7, Stubbs cites an utterance made by "Dame Edna Everage (Barry Humphries)", whom he uses as an example of "parodies of psycho-babbles by social satirists". Perhaps this is enough, but what this signifies I would have thought might be only grasped by those with the knowledge of (some areas of) the British popular culture in the 1980s/1990s.

    Such minor issues aside, this book presents the author's arguments fairly convincingly in a style accessible to undergraduate students. I recommend this book also to postgraduate students of semantics/pragmatics who may have a narrower conception of what 'meaning' means. Putting on my language teacher's hat, I would like to see a dictionary, or still better, an on-line interface, which EFL students could use to find out if the words they put together really collocate with one another or not. As Stubbs rightly points out, "many connotations for which there is strong corpus evidence are not recorded in dictionaries" (p.198) and currently available "[d]ictionaries have no systematic way of relating words which have shared connotations" (p.203). One of Stubbs' main claims is that connotations are a central part of meaning as "the whole point of utterance may be to express the speaker's attitude, evaluation and point of view" (p.198). If this appears tenable, and he provides ample evidence for it, then it is essential for EFL learners to come to grips with connotations.

    REFERENCES Dummett, M. (1981) Frege: Philosophy of Language 2nd ed., Duckworth.

    Frege, G. (1884) Die Grundlagen der Arithmetik: eine logisch- mathematische Untersuchung ueber den Begriff der Zahl, Breslau.

    Hymes, D. (1972) "On communicative competence". In J. Pride and J. Holmes (eds.) Sociolinguistics, Penguin, pp. 269-293.

    Lyons, J. (1977) Semantics, 2 Vols., Cambridge University Press.

    Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford University Press.

    Youmans, G. (1991) "A new tool for discourse analysis: the vocabulary management profile", Language 67:4, pp.763-789.

    ABOUT THE REVIEWER Mayumi Masuko did her postgraduate studies at the University of Cambridge, where she received an MPhil and a PhD in linguistics. She is an Associate Professor of English at Waseda University, where she teaches English and linguistics. Her main research interest lies in the interaction between semantics (broadly conceived) and morphosyntax.