From: Rolf Kreyer <rkreyeruni-bonn.de>
Subject: Corpora in Cognitive Linguistics
Announced at http://linguistlist.org/issues/17/17-1922.html
EDITORS: Gries, Stefan Th.; Stefanowitsch, AnatolTITLE: Corpora in Cognitive LinguisticsSUBTITLE: Corpus-based Approaches to Syntax and LexisPUBLISHER: Mouton de GruyterYEAR: 2006
Rolf Kreyer, Department of English, American and Celtic Studies, Universityof Bonn
The volume under review is a collection of nine papers on 344 pages, all ofwhich aim to show how issues of cognitive linguistics can benefit from theextensive use of corpus data and from the application of objectivestatistical methods. Although the volume is not divided into separate'parts', the papers can broadly be subsumed under three major groups,namely 1) four papers on the study of semantic similarity, 2) three paperson the linguistic manifestation of causation and transitivity, and 3) twopapers on the role of image-schemas in cognitive linguistics and theiranalysis with corpus-based methods.
The following synopsis will give a summary of the key points of each of thearticles. The review will conclude with a critical evaluation.
In her paper ''Ways of intending: Delineating and structuringnear-synonyms'', Dagmar Divjak analyses meaning differences in ''five Russiannear-synonymous verbs that, in combination with an infinitive, express theconcept INTEND TO CARRY OUT AN ACTION'' (19). Her study falls into twoparts, the first being based on elicitation, the second on corpus data. Theaim of the former is to show that the degree of similarity between lexemescan be measured reliably with the help of ''precise syntactic and semanticdata on the distribution of the potentially near-synonymous lexemes overconstructions and of their collocates over the slots of thoseconstructions'' (22). On the basis of an elicitation experiment that tapsknowledge of realizations of the underlying pattern [Vfin Vinf] the authoridentifies five verbs that show constructional similarities, namely'namervat'sja', 'sobirat'sja', 'predpolagat'', 'dumat'' and 'xotet''. Oneverb which, on a purely semantic basis, usually is considered to fall intothe same meaning group of intention, namely 'planirovat', is excluded fromthis list, since it does not show the same constructional properties.Interestingly, this difference in constructional realization reflects asemantic distinction between 'planirovat' and the other verbs, whichapparently is difficult to recognize through purely meaning-basedmethodologies. They ''seem to lack a precise enough measure to determine thedegree of similarity; the proposed solutions may thus be influenced by theauthors' opinion on what an intention should be as well as by prototypeeffects typical of human categorization'' (32). Accordingly, ''[a]n approachthat builds on distribution parameters from argument- and event-structureoffers a viable alternative'' (32).
In the second part of her study, Divjak sets out to explore thenear-synonyms in more detail. More specifically, she analyses four of theabove verbs, 'namervat'sja', 'sobirat'sja', 'dumat'' and 'xotet'' , withregard to 47 parameters which fall into two broad groups, the first beingconcerned with the formal instantiations of the slots provided by thepattern [Vfin Vinf], the second focussing on semantic paraphrases for thesubject and the infinitive in the construction. AnHierarchical-Agglomerative-Clustering analysis (HAC) shows that withregard to the formal realizations 'dumat'' and 'xotet'' are most similar,while a focus on the semantic paraphrases reveals strong similaritiesbetween 'dumat'' and 'namerevat'sja'. Surprisingly, an HAC over all 47parameters shows a pattern that is similar to the one that is revealed whenmerely taking the formal aspects into account. It follows that the overallimpact of semantic variables seems to be rather unimportant. In contrast,''the way constructional slots are formed can be decisive in determining thedegree of closeness between near synonyms'' (46). Accordingly, theconstructional approach to near-synonyms applied in this paper is advocatedas ''a valid [sic!] verifiable and repeatable alternative to meaning-based,introspective methods'' (46).
Stefan Th. Gries In his article ''Corpus-based methods and cognitivesemantics: The many senses of 'to run''' Stefan Th. Gries tries to bridgethe gap between corpus linguistics and cognitive linguistics by''demonstrating how cognitive linguistics can benefit from methodologiesfrom corpus linguistics and computational linguistics'' (57-8). To this end,the author, by conducting a number of case studies, illustrates how a'traditional' cognitive analysis of the individual meanings of 'to run' canbe supplemented by corpus-based statistical methods. Underlying allanalyses are the 815 instances of the verb in the British component of theInternational Corpus of English (ICE-GB) and the Brown corpus of AmericanEnglish. The cognitive investigations result in a radial network of a totalof 56 senses of 'to run', with five central senses around which the otherscluster, namely 'fast pedestrian motion', 'fast motion', 'motion','abstract motion', and 'to cause motion'. The author points out thattraditional cognitive analysis would usually treat the sense 'motion' asprototypical since it ''is the sense from which most others can be (mosteconomically) derived'' (75). However, general and also more sophisticatedcorpus linguistic evidence indicate that 'fast pedestrian motion' is morecentral. As for the first kind of evidence, the CHILDES corpus shows thatthis sense is the most frequent in early stages of acquisition. Also,ICE-GB yields 60 instances of the noun 'run', about three quarters of whichrefer to the sense 'fast pedestrian motion' or closely related senses. Moresophisticated corpus-based methods like analyses of the behavioural profileof the verb's occurrences (based, among others, on morphological, syntacticand semantic properties) provide further evidence for the centrality of thesense 'fast pedestrian motion'. For instance, this sense seems to be ''theformally least constrained sense and can, thus, be considered unmarked andprototypical'' (76). Similarly, the fact that ''it exhibits most variationacross all formal and semantic characteristics which were coded'' alsopoints in the same direction. This case study on the prototypical sense of'to run', as well as other case studies (for instance, on thedistinctiveness of senses and on the question of how and where senses in anetwork should be connected), thus show how cognitive approaches canbenefit from corpus-based methods, in particular approaches that are basedon behavioural profiles: ''[A] behavioural profile is [...] the mostrewarding starting point that will hopefully be utilized more fully infuture work'' (90).
Stefanie Wulff, in her paper ''Go'-V vs. 'go-and'-V in English: A case ofconstructional synonymy?'', studies instances of the two superficiallysimilar double verb patterns that are exemplified below.
(1) Go find the books and show me.(2) Now, just keep polishing those glasses while I go and check the drinks.(101)
In contrast to generative-transformational approaches that treat the firstpattern as merely being a truncated surface form of the second one, theauthor shows that both patterns should be regarded as constructions (in aconstruction-grammar sense) in their own right. This claim is substantiatedby a number of smaller studies that analyse the behaviour of theconstructions at issue with the help of ''statistical methods such ascollocational overlap estimation, collostructional analysis, anddistinctive collexeme analysis'' (102). For instance, the latter technique,which ''measures the dissimilarity of semantically similar constructions onthe basis of their significant collexemes'' (119) shows that the 'go-and-V'pattern mostly occurs with stative verbs (although these sometimes may geta dynamic reading if they occur in this pattern, e.g. 'I might go and seeAunt Violet' (114)). The 'go-V' pattern, in contrast, usually choosesmotion verbs or verbs which imply activity. This and similar results leadWulff to conclude that ''while whatever action is denoted by the 'go-and-V'gains an event-like interpretation and is meant to embrace the wholesequence cascade of a typical event with a beginning and an end, themeaning of 'go-V' only denotes the initiation of an action and isinherently atelic, which invites process verbs to occupy the V2 slot''. (121)
In the their article ''Syntactic leaps or lexical variation? - More on'Creative Syntax''', the authors Beate Hamp and Doris Schönefeld analyse thecreative use of verbs in untypical complementation patterns, as exemplifiedin clauses like 'He supported them through the entrance door' or 'She borethem stupid'. Such cases of 'caused-motion' or 'resultative' constructionshave been studied at length in construction-grammar approaches such as, forinstance, Goldberg (1995) where a fusion model is advocated to account forthe apparent change in the complementational behaviour of the verb: ''theverb 'inherits' a syntactic slot from an argument-structure construction(ASC) it is usually not associated with. [...T]he ASC provides both a verygeneric meaning and a syntactic template [...] which gets fused with thesemantic and syntactic frame of the verb at issue [... and thus] licensesboth the semantic change incurred and the appearance of additionalsyntactic slots'' (129-30). The authors agree with Goldberg in attributing acentral role in creative verb use to the ASC but they suggest that the ASCplays a different role, namely acting ''as a trigger to the activation ofanother verb [...] as input to a blending process'' (130). The creative useof 'fear' in resultative constructions (e.g. 'Hundreds of people are feareddead after a mining disaster'), for instance, could thus be explained byreference to lexical influence. 'Fear' in its creative use shows a veryhigh collocational strength with 'dead' and only occurs in passiveconstructions, a situation that is similar to the use of main verb 'find',as in 'the bookseller was found dead'. In the case of 'fear', then, thecreative use might have originated in ''lexically filled model collocations,such as 'X (be) found dead', from which a specific ''creative'' collocationlike 'X (be) feared dead' may be formed by lexically manipulating the modelpattern in only one slot'' (148). Similarly, all other creatively used verbsstudied by Hampe and Schˆnefeld also show strong collocational restrictionswith regard to the newly acquired argument slot, a finding that is notpredicted or explained by Goldberg's account. Accordingly, the authorsconclude ''what is treated as a merely syntactic [...] type of creativity inGoldberg's 'fusion model' may be governed to variable extents, by lexicalprocesses'' (150).
Gaëtanelle Gilquin's paper ''The place of prototypicality in corpuslinguistics: Causation in the hot seat'' investigates the relation ofcognitive prototypes and corpus-linguistic frequencies. More specifically,she explores to what extent authentic periphrastic causative constructions(for instance, 'get your father to run us out') can be interpreted asrealizations of one of three cognitive models of prototypical causation(namely the notion of iconic sequencing with the order 'causer - causee -patient', the billiard-ball model (both Langacker 1991), and Lakoff's(1987) direct manipulation model). Surprisingly, ''the models ofprototypical causation described in the cognitive literature account for anastonishingly small proportion of the data'' (175), i.e. only 45%. Althoughthe author adduces some qualifications that may (at least to some extent)''reduce the distance between cognitive salience and frequency [...] thislack of overlap nonetheless questions our deepest intuitions and calls forexplanation'' (181). One such explanation may lie in the fact that thecognitive ''models proposed in the literature are not valid descriptions ofprototypical causation'' (178). In particular, all of the three modelsdiscussed seem to be merely based on the intuition of the originators,which, as corpus-linguistic studies have shown with regard to many othercontexts may be rather unreliable. Furthermore, Gilquin, followingGeeraerts (1989), claims that the concept of prototypicality itself isprototypical and, hence, may be too fuzzy to be applied satisfactorily.What is needed is ''a refined and more detailed description of this concept,which might involve multi-faceted characterisation and/or additionaladjustments, such as assigning particular weight to each parameter definingthe prototype'' (181). In this respect, for instance, corpus linguistic dataon frequently occurring patterns of use may be helpful.
The paper 'Passivisability of English periphrastic causatives'' by WillemHollmann is an attempt ''to account for the differences in passivisabilityof English periphrastic causatives'' (193), as exemplified in sentences like'Recruits were made to hop on the spot' or 'People in their work roles arecaused to respond from their unconscious world of internal objects.'Hollmann restricts himself to an empirical analysis of instances of 'tomake', since semantically it is the most general causative, and,accordingly, ''results may be extended to other causatives'' (193). On thebasis of Hopper and Thompson's (1980) work on transitivity the authorsuggests several scales that are supposed to capture the transitivity ofthe constructions under scrutiny. For instance scales like 'fullaffectedness < partial affectedness (of the object)' or 'inducive
John Newman and Sally Rice explore the ''Transitivity schemas of English EATand DRINK in the BNC''. In particular, the authors analyse how the two verbsare used transitively and intransitively within spoken and written Englishand what kinds of nouns occur as subjects and objects. Also, they aim toshow how usage patterns of the two verbs depend on the form of the verbthat is actually used. For instance, the lexeme EAT occurs more frequentlyin both the written and spoken material than DRINK and usually is the firstin combinations ('ate and drank' instead of 'drank and ate'). In the viewof the authors this might indicate ''experiential salience: when we eat anddrink, the drinking is an accompaniment to the eating, rather than theother way round'' (236). Other findings include the nature of objects thatusually occur in transitive uses of the two verbs: the most frequent objectwith EAT, for instance, is 'food'. In addition, among the 20 most frequentwords many occurrences denote particular kinds of meals, such as'breakfast', 'lunch', or 'dinner'. In this context the authors note thatwhile it is good practice in dictionaries ''to recognize a 'food' and 'meal'kind of understood object on intransitive EAT [... their] results show thatthese two categories are a feature of the 'transitive' use of EAT as well''(246-7). The authors report similar findings with regard to DRINK: as inthe case of EAT, intransitive uses are usually described as with referenceto an understood object denoting some kind of alcohol. Again, their corpusstudy on the transitive use of the verb DRINK shows that ''[t]he occurrenceof names for alcoholic beverages is striking'' (248). If lexicographersleave this fact unmentioned in their description of the uses of the verbsthis might be interpreted as mirroring a difference between transitive andintransitive uses, that is not actually given in authentic usage data. Afuller integration of corpus-linguistic findings could thus help to makeapparent ''the full extent of inferences and collocational propertiesassociated with a verb [...] and the ensuing description becomes moreobservationally adequate'' (248). Finally, the authors stress the importanceof the word forms in studies of the kind they conduct, since syntacticand/or semantic properties of the usage of a word are usually tied toparticular word forms and do not necessarily hold true for the completelemma. Accordingly the authors claim that ''the notion of a dictionary entrybased on a lemma is still inadequate'' (255).
Maarten Lemmens paper on ''Caused posture: Experiential patterns emergingfrom corpus research'' investigates the relation of the three Dutch cardinalposture verbs 'zitten' 'sit', 'liggen' 'lie', and 'staan' 'stand' and theircausative counterparts 'zetten' 'set', 'leggen' 'lay' 'steken/stoppen''stick (into)' and 'doen' 'do'. On the basis of an analysis of 7550 tokens,the author finds that usually there is no ''direct link between thecausatives and the non-causatives, in the sense that one can always recastone in terms of the other'' (279). While 'liggen' and its causativecounterpart 'leggen' show clear correspondences, the situation is differentfor 'staan', which only in a few metaphorical uses is related to itsapparent counterpart 'stellen' - more frequent and regular is 'zetten', thecausative that corresponds to the posture verb 'zitten'. Causatives relatedto 'zitten', in addition to 'zetten', include 'steken', 'stoppen' and'doen'. Lemmens further analyses the distribution of postural andlocational uses of the causative verbs in those cases where the 'causee',i.e. the 'entity' that is put somewhere, is human. Surprisingly, posturalreadings of the causative verbs with human causee, i.e. 'bring a person ina standing/sitting/lying position', are only rarely attested in the corpusdata. For example, less than 1% of all occurrence of 'leggen' and 'zetten'involve postural usage, and seem to be restricted to two cases: 1)''situations where people no longer control their own posture'' (283), as isthe case with babies or ill people, or 2) contexts where people aremanipulated or put somewhere, e.g. being expelled from a country or from ahouse. In addition, 'zetten' seems to have become highly productive. This,in the view of the author, is due to the fact the '''zetten' has generalizedto the meaning 'put an entity in its canonical position''' (285), whichnaturally makes it applicable to a large number of situations. 'Zetten'thus seems to have become the default causative verb.
The final paper of this volume ''From conceptualization to linguisticexpression: Where languages diversify'' by Doris Schönefeld analysesdifferences in conceptualizations of similar scenes in English, German andRussian. The paper is informed by the idea that speakers usually havechoices in the way they conceptualize a particular scene and that theseconceptualizations leave traces in their verbalization. It follows that''from habitual, i.e. typical and frequent, expressions of a language we caninfer a speech community's habitual ways of conceptualization'' (298). Theauthor tries to identify such 'patterns' of conceptualization through acorpus-analysis of collocations found with the posture verbs 'sit', 'stand'and 'lie'. The analysis, for instance, shows that the three languages mayuse different prepositions with identical verbs to describe similarsituations: While English and German students 'sit over' books (Ger. überden Büchern sitzen'), Russian students rather 'sit behind' books (Rus.'Sidet' za knigami'). Similarly, in England and Russia books stand on theshelf while in Germany they stand 'in' the shelf (Ger. 'das Buch steht imRegal'). These and similar examples show that in their construal of thesituation different languages activate different image-schemas. With regardto the book example above, for instance, English and German construe therelative position of landmark (book) and trajector (student) on the basisof the UP-DOWN schema while Russian employs the FRONT-BACK and NEAR-FARschema. Further differences show when in the description of similarscenarios different posture verbs or even non-posture verbs are used in oneor two languages. On the whole, the author finds that ''diversificationsbetween languages [...] may be the result of diverging construals bydrawing on different image-schema combinations in the conceptualizations ofthe phenomena to be expressed. [...] image-schemas are centrally employed[...] in the conceptualization and verbalization of identical/comparable(posture) scene, and [...] different speech communities can construe thesescenes differently by highlighting particular image schemas at the expenseof others'' (330). Again, corpus-based observations may yield interestinginsights into areas of cognitive linguistic research.
Stefan Th. Gries and Anatol Stefanowitsch, in my view, have edited anexcellent selection of papers. The articles are generally of a very highquality and highly stimulating and show impressively how cognitivelinguistics may benefit from corpus linguistic research and (advanced)statistical methods. As the title already makes clear, the volume first ofall is aimed at researchers from a cognitive- and corpus-linguisticbackground.
The former will find articles that represent three traditional areas ofcognitive linguistics, namely similarity and dissimilarity of senses andways of describing their organisation, cognitive approaches to grammar witha special focus on aspects of transitivity, and, finally, studies on therelevance of image schemas for human conceptualization and how this ismirrored in language use. In addition to the insights presented in theindividual articles, the cognitive linguist is likely to benefit enormouslyfrom seeing a vast range of corpus-linguistic and statistical methods atwork. The studies presented, thus, no doubt open up new methodologicalperspectives for the field of cognitive linguistics.
The volume will also prove valuable for the corpus linguists, as it shows anumber of 'new' ways to exploit authentic data. While notions like 'mutualinformation', 'z-score', or 'chi-square' by now are part of receivedcorpus-linguistic wisdom, this volume confronts the corpus linguist drivenby the urge for objectivity with a large number of more advancedstatistical methods, like collocational overlap estimation,collostructional analysis, or hierarchical cluster analysis, to name but afew. These new ways of analysing large amounts of authentic data should bewelcome to any linguistic working with corpora. To quote Jan Aarts(although on another topic): ''If you want a challenge, there it is''.
Still, while clearly advocating the use of corpus-linguistic and advancedstatistical methods, the reader never gets the feeling that these areregarded as ends in themselves but merely serve ancillary purposes. In thisrespect, the following quote by Gries, in my view, can be seen asrepresentative of the attitude common to all of the papers: ''I have triedto emphasize the benefits of additional corpus-based evidence, but I shouldlike to point out, however, that I do not advocate using corpus evidencealone. Corpus evidence can complement different research methodologies suchas (psycho-)linguistic experiments, but it should not replace them'' (87).
Another group of researchers that will certainly benefit from this volumeare lexicographers. The volume provides a number of case studies onidentifying meaning and, most importantly, show how meaning is tied tosemantic and syntactic context. This book, like many others before, thusprovides further evidence for the lack of strict boundaries between lexisand grammar, and may contribute to more accurate descriptions of meaningsin dictionaries.
Finally, the proof-reading turns out to have been almost perfect. Only avery few errata remain, which is within more than reasonable limits for abook of roughly 350 pages.
On the whole, the volume makes for a highly stimulating and interestingread and shows numerous ways in which corpus-linguistic methods may help tocomplement cognitive approaches to linguistics. In my view, theillustration of a vast range of statistical methods is particularlyappealing, and shows to what extent 'traditional' ways of analysis mightbenefit from the objective exploitation of usage-based data. If (cognitive)linguistics will really experience ''a major methodological paradigm shiftin the direction of corpus work'' (14), as is the hope expressed in theintroduction by Stefan Th. Gries, can of course not be answered now - butthis volume no doubt makes such a shift appear very attractive.
Geeraerts, Dirk (1989): ''Introduction: Prospects and problems of prototypetheory'', Linguistics 27: 587-612.
Goldberg, Adele (1995): Constructions. A Construction-Grammar Approach toArgument Structure. Chicago: The University of Chicago Press.
Hopper, Paul and Sandra A. Thompson (1980): ''Transitivity in grammar anddiscourse'', Language 56: 251-299.
Lakoff, George (1987): Women, Fire, and Dangerous Things. What CategoriesReveal about the Mind. Chicago: The University of Chicago Press.
Langacker, Ronald W. (1991): Foundations of Cognitive Grammar. Vol. II.Descriptive Applications. Stanford: Stanford University Press.
ABOUT THE REVIEWER
Rolf Kreyer is an Assistant Professor of Modern English Linguistics in thedepartment of English, American and Celtic Studies of the University ofBonn, Germany. His research interests include corpus linguistics, syntax,and text linguistics. He is the author of "Inversion in Modern WrittenEnglish. Syntactic Complexity, Information Status and the Creative Writer",which was published in 2006 by Gunter Narr. At present he is working on acorpus-linguistic study that aims to analyse the interaction of languageuse and grammar.