Review of Corpora in Cognitive Linguistics

Reviewer: Rolf Michael Kreyer
Book Title: Corpora in Cognitive Linguistics
Book Author: Stefan Th. Gries Anatol Stefanowitsch
Publisher: De Gruyter Mouton
Linguistic Field(s): Text/Corpus Linguistics
Cognitive Science
Issue Number: 18.243

EDITORS: Gries, Stefan Th.; Stefanowitsch, Anatol
TITLE: Corpora in Cognitive Linguistics
SUBTITLE: Corpus-based Approaches to Syntax and Lexis
PUBLISHER: Mouton de Gruyter
YEAR: 2006

Rolf Kreyer, Department of English, American and Celtic Studies, University
of Bonn

The volume under review is a collection of nine papers on 344 pages, all of
which aim to show how issues of cognitive linguistics can benefit from the
extensive use of corpus data and from the application of objective
statistical methods. Although the volume is not divided into separate
'parts', the papers can broadly be subsumed under three major groups,
namely 1) four papers on the study of semantic similarity, 2) three papers
on the linguistic manifestation of causation and transitivity, and 3) two
papers on the role of image-schemas in cognitive linguistics and their
analysis with corpus-based methods.

The following synopsis will give a summary of the key points of each of the
articles. The review will conclude with a critical evaluation.


In her paper ''Ways of intending: Delineating and structuring
near-synonyms'', Dagmar Divjak analyses meaning differences in ''five Russian
near-synonymous verbs that, in combination with an infinitive, express the
concept INTEND TO CARRY OUT AN ACTION'' (19). Her study falls into two
parts, the first being based on elicitation, the second on corpus data. The
aim of the former is to show that the degree of similarity between lexemes
can be measured reliably with the help of ''precise syntactic and semantic
data on the distribution of the potentially near-synonymous lexemes over
constructions and of their collocates over the slots of those
constructions'' (22). On the basis of an elicitation experiment that taps
knowledge of realizations of the underlying pattern [Vfin Vinf] the author
identifies five verbs that show constructional similarities, namely
'namervat'sja', 'sobirat'sja', 'predpolagat'', 'dumat'' and 'xotet''. One
verb which, on a purely semantic basis, usually is considered to fall into
the same meaning group of intention, namely 'planirovat', is excluded from
this list, since it does not show the same constructional properties.
Interestingly, this difference in constructional realization reflects a
semantic distinction between 'planirovat' and the other verbs, which
apparently is difficult to recognize through purely meaning-based
methodologies. They ''seem to lack a precise enough measure to determine the
degree of similarity; the proposed solutions may thus be influenced by the
authors' opinion on what an intention should be as well as by prototype
effects typical of human categorization'' (32). Accordingly, ''[a]n approach
that builds on distribution parameters from argument- and event-structure
offers a viable alternative'' (32).

In the second part of her study, Divjak sets out to explore the
near-synonyms in more detail. More specifically, she analyses four of the
above verbs, 'namervat'sja', 'sobirat'sja', 'dumat'' and 'xotet'' , with
regard to 47 parameters which fall into two broad groups, the first being
concerned with the formal instantiations of the slots provided by the
pattern [Vfin Vinf], the second focussing on semantic paraphrases for the
subject and the infinitive in the construction. An
Hierarchical-Agglomerative-Clustering analysis (HAC) shows that with
regard to the formal realizations 'dumat'' and 'xotet'' are most similar,
while a focus on the semantic paraphrases reveals strong similarities
between 'dumat'' and 'namerevat'sja'. Surprisingly, an HAC over all 47
parameters shows a pattern that is similar to the one that is revealed when
merely taking the formal aspects into account. It follows that the overall
impact of semantic variables seems to be rather unimportant. In contrast,
''the way constructional slots are formed can be decisive in determining the
degree of closeness between near synonyms'' (46). Accordingly, the
constructional approach to near-synonyms applied in this paper is advocated
as ''a valid [sic!] verifiable and repeatable alternative to meaning-based,
introspective methods'' (46).

Stefan Th. Gries In his article ''Corpus-based methods and cognitive
semantics: The many senses of 'to run''' Stefan Th. Gries tries to bridge
the gap between corpus linguistics and cognitive linguistics by
''demonstrating how cognitive linguistics can benefit from methodologies
from corpus linguistics and computational linguistics'' (57-8). To this end,
the author, by conducting a number of case studies, illustrates how a
'traditional' cognitive analysis of the individual meanings of 'to run' can
be supplemented by corpus-based statistical methods. Underlying all
analyses are the 815 instances of the verb in the British component of the
International Corpus of English (ICE-GB) and the Brown corpus of American
English. The cognitive investigations result in a radial network of a total
of 56 senses of 'to run', with five central senses around which the others
cluster, namely 'fast pedestrian motion', 'fast motion', 'motion',
'abstract motion', and 'to cause motion'. The author points out that
traditional cognitive analysis would usually treat the sense 'motion' as
prototypical since it ''is the sense from which most others can be (most
economically) derived'' (75). However, general and also more sophisticated
corpus linguistic evidence indicate that 'fast pedestrian motion' is more
central. As for the first kind of evidence, the CHILDES corpus shows that
this sense is the most frequent in early stages of acquisition. Also,
ICE-GB yields 60 instances of the noun 'run', about three quarters of which
refer to the sense 'fast pedestrian motion' or closely related senses. More
sophisticated corpus-based methods like analyses of the behavioural profile
of the verb's occurrences (based, among others, on morphological, syntactic
and semantic properties) provide further evidence for the centrality of the
sense 'fast pedestrian motion'. For instance, this sense seems to be ''the
formally least constrained sense and can, thus, be considered unmarked and
prototypical'' (76). Similarly, the fact that ''it exhibits most variation
across all formal and semantic characteristics which were coded'' also
points in the same direction. This case study on the prototypical sense of
'to run', as well as other case studies (for instance, on the
distinctiveness of senses and on the question of how and where senses in a
network should be connected), thus show how cognitive approaches can
benefit from corpus-based methods, in particular approaches that are based
on behavioural profiles: ''[A] behavioural profile is [...] the most
rewarding starting point that will hopefully be utilized more fully in
future work'' (90).

Stefanie Wulff, in her paper ''Go'-V vs. 'go-and'-V in English: A case of
constructional synonymy?'', studies instances of the two superficially
similar double verb patterns that are exemplified below.

(1) Go find the books and show me.
(2) Now, just keep polishing those glasses while I go and check the drinks.

In contrast to generative-transformational approaches that treat the first
pattern as merely being a truncated surface form of the second one, the
author shows that both patterns should be regarded as constructions (in a
construction-grammar sense) in their own right. This claim is substantiated
by a number of smaller studies that analyse the behaviour of the
constructions at issue with the help of ''statistical methods such as
collocational overlap estimation, collostructional analysis, and
distinctive collexeme analysis'' (102). For instance, the latter technique,
which ''measures the dissimilarity of semantically similar constructions on
the basis of their significant collexemes'' (119) shows that the 'go-and-V'
pattern mostly occurs with stative verbs (although these sometimes may get
a dynamic reading if they occur in this pattern, e.g. 'I might go and see
Aunt Violet' (114)). The 'go-V' pattern, in contrast, usually chooses
motion verbs or verbs which imply activity. This and similar results lead
Wulff to conclude that ''while whatever action is denoted by the 'go-and-V'
gains an event-like interpretation and is meant to embrace the whole
sequence cascade of a typical event with a beginning and an end, the
meaning of 'go-V' only denotes the initiation of an action and is
inherently atelic, which invites process verbs to occupy the V2 slot''. (121)

In the their article ''Syntactic leaps or lexical variation? - More on
'Creative Syntax''', the authors Beate Hamp and Doris Schönefeld analyse the
creative use of verbs in untypical complementation patterns, as exemplified
in clauses like 'He supported them through the entrance door' or 'She bore
them stupid'. Such cases of 'caused-motion' or 'resultative' constructions
have been studied at length in construction-grammar approaches such as, for
instance, Goldberg (1995) where a fusion model is advocated to account for
the apparent change in the complementational behaviour of the verb: ''the
verb 'inherits' a syntactic slot from an argument-structure construction
(ASC) it is usually not associated with. [...T]he ASC provides both a very
generic meaning and a syntactic template [...] which gets fused with the
semantic and syntactic frame of the verb at issue [... and thus] licenses
both the semantic change incurred and the appearance of additional
syntactic slots'' (129-30). The authors agree with Goldberg in attributing a
central role in creative verb use to the ASC but they suggest that the ASC
plays a different role, namely acting ''as a trigger to the activation of
another verb [...] as input to a blending process'' (130). The creative use
of 'fear' in resultative constructions (e.g. 'Hundreds of people are feared
dead after a mining disaster'), for instance, could thus be explained by
reference to lexical influence. 'Fear' in its creative use shows a very
high collocational strength with 'dead' and only occurs in passive
constructions, a situation that is similar to the use of main verb 'find',
as in 'the bookseller was found dead'. In the case of 'fear', then, the
creative use might have originated in ''lexically filled model collocations,
such as 'X (be) found dead', from which a specific ''creative'' collocation
like 'X (be) feared dead' may be formed by lexically manipulating the model
pattern in only one slot'' (148). Similarly, all other creatively used verbs
studied by Hampe and Schˆnefeld also show strong collocational restrictions
with regard to the newly acquired argument slot, a finding that is not
predicted or explained by Goldberg's account. Accordingly, the authors
conclude ''what is treated as a merely syntactic [...] type of creativity in
Goldberg's 'fusion model' may be governed to variable extents, by lexical
processes'' (150).

Gaëtanelle Gilquin's paper ''The place of prototypicality in corpus
linguistics: Causation in the hot seat'' investigates the relation of
cognitive prototypes and corpus-linguistic frequencies. More specifically,
she explores to what extent authentic periphrastic causative constructions
(for instance, 'get your father to run us out') can be interpreted as
realizations of one of three cognitive models of prototypical causation
(namely the notion of iconic sequencing with the order 'causer - causee -
patient', the billiard-ball model (both Langacker 1991), and Lakoff's
(1987) direct manipulation model). Surprisingly, ''the models of
prototypical causation described in the cognitive literature account for an
astonishingly small proportion of the data'' (175), i.e. only 45%. Although
the author adduces some qualifications that may (at least to some extent)
''reduce the distance between cognitive salience and frequency [...] this
lack of overlap nonetheless questions our deepest intuitions and calls for
explanation'' (181). One such explanation may lie in the fact that the
cognitive ''models proposed in the literature are not valid descriptions of
prototypical causation'' (178). In particular, all of the three models
discussed seem to be merely based on the intuition of the originators,
which, as corpus-linguistic studies have shown with regard to many other
contexts may be rather unreliable. Furthermore, Gilquin, following
Geeraerts (1989), claims that the concept of prototypicality itself is
prototypical and, hence, may be too fuzzy to be applied satisfactorily.
What is needed is ''a refined and more detailed description of this concept,
which might involve multi-faceted characterisation and/or additional
adjustments, such as assigning particular weight to each parameter defining
the prototype'' (181). In this respect, for instance, corpus linguistic data
on frequently occurring patterns of use may be helpful.

The paper 'Passivisability of English periphrastic causatives'' by Willem
Hollmann is an attempt ''to account for the differences in passivisability
of English periphrastic causatives'' (193), as exemplified in sentences like
'Recruits were made to hop on the spot' or 'People in their work roles are
caused to respond from their unconscious world of internal objects.'
Hollmann restricts himself to an empirical analysis of instances of 'to
make', since semantically it is the most general causative, and,
accordingly, ''results may be extended to other causatives'' (193). On the
basis of Hopper and Thompson's (1980) work on transitivity the author
suggests several scales that are supposed to capture the transitivity of
the constructions under scrutiny. For instance scales like 'full
affectedness < partial affectedness (of the object)' or 'inducive <
volitional < affective < physical (causation)' capture the causality aspect
of transitivity (with scales showing a decrease of transitivity from left
to right); the scale 'unity of space and time < absence of unity of
space/time < absence of unity of space and time' is supposed to capture the
aspect of directness of the cause, and so on. The application of the
descriptive framework to 400 instances of active/passive causative uses of
make in the past/present (i.e. 100 tokens for each configuration) shows
that causation type exhibits a strong influence on the passivisation of
'make' causatives, while the affectedness of the object does not yield
significant results. Similar, the results for directness of the cause only
seem to have a marginal influence on passivisability. The results obtained
from the empirical study leads Hollmann to posit a number of ''implicational
universals that may be proposed to capture the relation between the
semantics of causatives and their degree of passivisability'' (213). The
influence of causation type, for instance, is described as follows: ''If a
language allows passivisation of causative constructions towards the lower,
less transitive end of the causation type scale then the constructions
toward the higher, more transitive end of the scale will also be
passivisable (all other things being equal).'' (213) These implicational
universals are then tested against a small set of other causative verbs
like 'get', 'force', or 'persuade' (reported more fully in Hollmann 2003).
This comparison shows that the individual factors or scales applied in this
study do not reliably predict the frequency of passivisation, if they are
considered equally influential. Rather the data point to the fact that the
different factors need to be weighted. Here, again, corpus-based methods
that ''assess the relevance of the factors in question'' might prove useful.

John Newman and Sally Rice explore the ''Transitivity schemas of English EAT
and DRINK in the BNC''. In particular, the authors analyse how the two verbs
are used transitively and intransitively within spoken and written English
and what kinds of nouns occur as subjects and objects. Also, they aim to
show how usage patterns of the two verbs depend on the form of the verb
that is actually used. For instance, the lexeme EAT occurs more frequently
in both the written and spoken material than DRINK and usually is the first
in combinations ('ate and drank' instead of 'drank and ate'). In the view
of the authors this might indicate ''experiential salience: when we eat and
drink, the drinking is an accompaniment to the eating, rather than the
other way round'' (236). Other findings include the nature of objects that
usually occur in transitive uses of the two verbs: the most frequent object
with EAT, for instance, is 'food'. In addition, among the 20 most frequent
words many occurrences denote particular kinds of meals, such as
'breakfast', 'lunch', or 'dinner'. In this context the authors note that
while it is good practice in dictionaries ''to recognize a 'food' and 'meal'
kind of understood object on intransitive EAT [... their] results show that
these two categories are a feature of the 'transitive' use of EAT as well''
(246-7). The authors report similar findings with regard to DRINK: as in
the case of EAT, intransitive uses are usually described as with reference
to an understood object denoting some kind of alcohol. Again, their corpus
study on the transitive use of the verb DRINK shows that ''[t]he occurrence
of names for alcoholic beverages is striking'' (248). If lexicographers
leave this fact unmentioned in their description of the uses of the verbs
this might be interpreted as mirroring a difference between transitive and
intransitive uses, that is not actually given in authentic usage data. A
fuller integration of corpus-linguistic findings could thus help to make
apparent ''the full extent of inferences and collocational properties
associated with a verb [...] and the ensuing description becomes more
observationally adequate'' (248). Finally, the authors stress the importance
of the word forms in studies of the kind they conduct, since syntactic
and/or semantic properties of the usage of a word are usually tied to
particular word forms and do not necessarily hold true for the complete
lemma. Accordingly the authors claim that ''the notion of a dictionary entry
based on a lemma is still inadequate'' (255).

Maarten Lemmens paper on ''Caused posture: Experiential patterns emerging
from corpus research'' investigates the relation of the three Dutch cardinal
posture verbs 'zitten' 'sit', 'liggen' 'lie', and 'staan' 'stand' and their
causative counterparts 'zetten' 'set', 'leggen' 'lay' 'steken/stoppen'
'stick (into)' and 'doen' 'do'. On the basis of an analysis of 7550 tokens,
the author finds that usually there is no ''direct link between the
causatives and the non-causatives, in the sense that one can always recast
one in terms of the other'' (279). While 'liggen' and its causative
counterpart 'leggen' show clear correspondences, the situation is different
for 'staan', which only in a few metaphorical uses is related to its
apparent counterpart 'stellen' - more frequent and regular is 'zetten', the
causative that corresponds to the posture verb 'zitten'. Causatives related
to 'zitten', in addition to 'zetten', include 'steken', 'stoppen' and
'doen'. Lemmens further analyses the distribution of postural and
locational uses of the causative verbs in those cases where the 'causee',
i.e. the 'entity' that is put somewhere, is human. Surprisingly, postural
readings of the causative verbs with human causee, i.e. 'bring a person in
a standing/sitting/lying position', are only rarely attested in the corpus
data. For example, less than 1% of all occurrence of 'leggen' and 'zetten'
involve postural usage, and seem to be restricted to two cases: 1)
''situations where people no longer control their own posture'' (283), as is
the case with babies or ill people, or 2) contexts where people are
manipulated or put somewhere, e.g. being expelled from a country or from a
house. In addition, 'zetten' seems to have become highly productive. This,
in the view of the author, is due to the fact the '''zetten' has generalized
to the meaning 'put an entity in its canonical position''' (285), which
naturally makes it applicable to a large number of situations. 'Zetten'
thus seems to have become the default causative verb.

The final paper of this volume ''From conceptualization to linguistic
expression: Where languages diversify'' by Doris Schönefeld analyses
differences in conceptualizations of similar scenes in English, German and
Russian. The paper is informed by the idea that speakers usually have
choices in the way they conceptualize a particular scene and that these
conceptualizations leave traces in their verbalization. It follows that
''from habitual, i.e. typical and frequent, expressions of a language we can
infer a speech community's habitual ways of conceptualization'' (298). The
author tries to identify such 'patterns' of conceptualization through a
corpus-analysis of collocations found with the posture verbs 'sit', 'stand'
and 'lie'. The analysis, for instance, shows that the three languages may
use different prepositions with identical verbs to describe similar
situations: While English and German students 'sit over' books (Ger. über
den Büchern sitzen'), Russian students rather 'sit behind' books (Rus.
'Sidet' za knigami'). Similarly, in England and Russia books stand on the
shelf while in Germany they stand 'in' the shelf (Ger. 'das Buch steht im
Regal'). These and similar examples show that in their construal of the
situation different languages activate different image-schemas. With regard
to the book example above, for instance, English and German construe the
relative position of landmark (book) and trajector (student) on the basis
of the UP-DOWN schema while Russian employs the FRONT-BACK and NEAR-FAR
schema. Further differences show when in the description of similar
scenarios different posture verbs or even non-posture verbs are used in one
or two languages. On the whole, the author finds that ''diversifications
between languages [...] may be the result of diverging construals by
drawing on different image-schema combinations in the conceptualizations of
the phenomena to be expressed. [...] image-schemas are centrally employed
[...] in the conceptualization and verbalization of identical/comparable
(posture) scene, and [...] different speech communities can construe these
scenes differently by highlighting particular image schemas at the expense
of others'' (330). Again, corpus-based observations may yield interesting
insights into areas of cognitive linguistic research.


Stefan Th. Gries and Anatol Stefanowitsch, in my view, have edited an
excellent selection of papers. The articles are generally of a very high
quality and highly stimulating and show impressively how cognitive
linguistics may benefit from corpus linguistic research and (advanced)
statistical methods. As the title already makes clear, the volume first of
all is aimed at researchers from a cognitive- and corpus-linguistic

The former will find articles that represent three traditional areas of
cognitive linguistics, namely similarity and dissimilarity of senses and
ways of describing their organisation, cognitive approaches to grammar with
a special focus on aspects of transitivity, and, finally, studies on the
relevance of image schemas for human conceptualization and how this is
mirrored in language use. In addition to the insights presented in the
individual articles, the cognitive linguist is likely to benefit enormously
from seeing a vast range of corpus-linguistic and statistical methods at
work. The studies presented, thus, no doubt open up new methodological
perspectives for the field of cognitive linguistics.

The volume will also prove valuable for the corpus linguists, as it shows a
number of 'new' ways to exploit authentic data. While notions like 'mutual
information', 'z-score', or 'chi-square' by now are part of received
corpus-linguistic wisdom, this volume confronts the corpus linguist driven
by the urge for objectivity with a large number of more advanced
statistical methods, like collocational overlap estimation,
collostructional analysis, or hierarchical cluster analysis, to name but a
few. These new ways of analysing large amounts of authentic data should be
welcome to any linguistic working with corpora. To quote Jan Aarts
(although on another topic): ''If you want a challenge, there it is''.

Still, while clearly advocating the use of corpus-linguistic and advanced
statistical methods, the reader never gets the feeling that these are
regarded as ends in themselves but merely serve ancillary purposes. In this
respect, the following quote by Gries, in my view, can be seen as
representative of the attitude common to all of the papers: ''I have tried
to emphasize the benefits of additional corpus-based evidence, but I should
like to point out, however, that I do not advocate using corpus evidence
alone. Corpus evidence can complement different research methodologies such
as (psycho-)linguistic experiments, but it should not replace them'' (87).

Another group of researchers that will certainly benefit from this volume
are lexicographers. The volume provides a number of case studies on
identifying meaning and, most importantly, show how meaning is tied to
semantic and syntactic context. This book, like many others before, thus
provides further evidence for the lack of strict boundaries between lexis
and grammar, and may contribute to more accurate descriptions of meanings
in dictionaries.

Finally, the proof-reading turns out to have been almost perfect. Only a
very few errata remain, which is within more than reasonable limits for a
book of roughly 350 pages.

On the whole, the volume makes for a highly stimulating and interesting
read and shows numerous ways in which corpus-linguistic methods may help to
complement cognitive approaches to linguistics. In my view, the
illustration of a vast range of statistical methods is particularly
appealing, and shows to what extent 'traditional' ways of analysis might
benefit from the objective exploitation of usage-based data. If (cognitive)
linguistics will really experience ''a major methodological paradigm shift
in the direction of corpus work'' (14), as is the hope expressed in the
introduction by Stefan Th. Gries, can of course not be answered now - but
this volume no doubt makes such a shift appear very attractive.


Geeraerts, Dirk (1989): ''Introduction: Prospects and problems of prototype
theory'', Linguistics 27: 587-612.

Goldberg, Adele (1995): Constructions. A Construction-Grammar Approach to
Argument Structure. Chicago: The University of Chicago Press.

Hopper, Paul and Sandra A. Thompson (1980): ''Transitivity in grammar and
discourse'', Language 56: 251-299.

Lakoff, George (1987): Women, Fire, and Dangerous Things. What Categories
Reveal about the Mind. Chicago: The University of Chicago Press.

Langacker, Ronald W. (1991): Foundations of Cognitive Grammar. Vol. II.
Descriptive Applications. Stanford: Stanford University Press.

Rolf Kreyer is an Assistant Professor of Modern English Linguistics in the
department of English, American and Celtic Studies of the University of
Bonn, Germany. His research interests include corpus linguistics, syntax,
and text linguistics. He is the author of "Inversion in Modern Written
English. Syntactic Complexity, Information Status and the Creative Writer",
which was published in 2006 by Gunter Narr. At present he is working on a
corpus-linguistic study that aims to analyse the interaction of language
use and grammar.

Format: Hardback
ISBN: 3110186055
ISBN-13: N/A
Pages: 354
Prices: U.S. $ 118.80