EDITORS: Ädel, Annelie; Reppen, Randi TITLE: Corpora and Discourse SUBTITLE: The challenges of different settings SERIES TITLE: Studies in Corpus Linguistics 31 PUBLISHER: John Benjamins YEAR: 2008
Michael Pace-Sigge, School of English, University of Liverpool, UK
SUMMARY This book brings together contributions from a diverse collection of scholars who explore different ways of combining corpus linguistics and discourse analysis, studying discourse at the prosodic, lexical, and textual levels. Both spoken and written discourse are investigated in a variety of settings, including academia, the workplace, news, and entertainment. Not only does the volume offer a rich sample of English language discourse from around the world, including international, learner, and non-standard varieties of English, but it also covers a range of topics and methods. This book will be of particular interest to researchers and students specializing in discourse studies, English linguistics, and corpus linguistics.
This book is a solid piece of work, a great resource and a valuable addition to the growing literature in the area of Corpus Linguistics (CL). How CL has gained in importance is described at the start of the book: ''Corpus linguistics has, over the past few decades, undergone a transformation from a 'little donkey cart' to a 'bandwagon' (Leech 1991) and is now (...) 'becoming part of mainstream linguistics' (Mukherjee 2004)'' (p.1).
Indeed, the editors point out that ''now'' computing power opens up areas of research that still remained closed at the turn of the century. Consequently, more and more linguistic fields make use of real occurring language data. One of these is Discourse Studies: ''Discourse phenomena, with their frequent dependence on and sensitivity to context, co-text and interpretation, require rather complex solutions'' (p. 1).
Ädel and Reppen divide the book into four sections: ''Exploring discourse in academic settings''; ''Exploring discourse in workplace settings''; ''Exploring discourse in news and entertainment''; and ''Exploring discourse through specific linguistic features''. This makes for coherent reading even when somebody would decide to read the whole collection from beginning to end. The editors must be praised for making a selection that is well-written throughout and, with one exception, extremely well-founded research which is presented in a very clear, logical and accessible way.
Below, I shall give a brief review of each of the contributions. Very few readers will be interested in all the areas of discourse approached. I am sure however that many readers will find one or two articles of great interest to them.
''... and so on and so forth''. A comparative analysis of vague category markers in academic discourse. (Walsh, O'Keefe & McCarthy) The authors start with the premise that the use of vague language is one of the most common features of spoken English. The authors consider in how far the everyday use differs from spoken academic discourse, making use of the Limerick Belfast Corpus of Academic Spoken English in comparison to the Limerick Corpus of Spoken English and CANCODE. This reveals, first of all, that the difference between UK English and Irish English is greater than between casual spoken and academic spoken use in the Irish corpora. The authors see several modes: the managerial mode used to start an activity; the skills and systems mode, which delivers an open platform for participation; and the classroom context mode, which resembles casual conversion. Walsh, O'Keefe & McCarthy show convincingly that academics use phrases like ''and so on'' when under time pressure and phrases like ''or anything like that'' to offer students options to respond – a ''softener'' to make conversation flow, particularly in tutorials.
All this is solidly researched and the background reading is well delivered without taking up too much space.
Emphatics in academic discourse (Bondi) Bondi looks at stance markers like _actually_, _definitely_, _apparently_, etc. in history and economics journal articles. Relying very much on Quirk et al (1985) she describes how emphasizers may take scope over the whole predicate or the whole sentence while intensifiers do not. Comparing the keywords in a history journal corpus with the keywords in a economics journal corpus she finds that _significantly_, _positively_, _substantially_, etc. are keywords in economics texts while _certainly_, _especially_, _particularly_, etc are keywords in history texts. Bondi finds that the variety of adverbs is larger in history and ''that economics tends to place emphasis on a simplification of reality based on a process of abstraction ('typically') and on statistics ('significantly') whereas history places emphasis on frequency and accumulation of factual data ('usually, largely...')'' (p.39). Highlighting that _significantly_ is a significant modifier in economics texts, she elaborates that _invariably_ is used in ''interestingly different patterns across disciplines'' (p. 50).
Still. I wondered whether this kind of research has not been done before. The total numbers supporting Bondi's claims seem to be low. For a number of her assertions, the literature used appears mostly to be old and while there is a longish introduction, many things seem to be claims to would need to be backed up – either by other research or by comparison with occurrence patterns in another corpus.
Interaction, identity and culture in academic writing (Sanderson) Sanderson starts with the premise that ''academic writing has traditionally been conceived as a register lacking personal involvement'' (p.57), a claim she rejects. In her paper she takes a multidimensional approach to look at evidence of personal identity within academic writing. She not only compares the age, gender and employment status of the writers, she also compares German with British and American academics.
She is looking at academic writing in the Arts, where a reader certainly would expect more personal involvement of the writer than in Science. While Sanderson looks at a statistical valid corpus for her research, there remains a feeling that the paper confirms very much her world-view as outlined in the introduction: tenured, older, male academics are more casual in their use of language, while those who are in less secure positions very much conform in their use of person reference. The cultural differences are made clear in this study: German writers feel the ''Ich Verbot '' (I taboo) while English-speaking academics feel free to address the reader directly. Likewise, when they want to express personal opinion, German academics usually claim group membership. When Sanderson compares the differences by discipline, it first appears that Philosophy offers the highest degree of person reference in both languages, yet context-based comparison reveals that German writers adhere strongly to the ''I taboo'' in this discipline too.
Analysis of the role of humour in workplace meetings (Vaughan) On first sight, this seems to be a rather difficult task for a corpus linguist to do: analyzing humor on the basis of transcripts. However, Vaughan points out that a great many transcripts used in corpora include the extralingual feature of laughter. In the meetings recorded (of English speaking teachers in Mexico and Ireland during school meetings) this is not necessarily an expected feature either, but revealing where it occurs. Seen as an integral part in spoken discourse, humor / laughter is shown to have a variety of functions. In her corpora, Vaughan says that it can be subversive or reinforcing. It can be used by the general staff (where it is reinforcing solidarity) or by heads of departments (where it is reinforcing power as well as solidarity). This is made very accessible to readers in a table on page 105.
A solid piece of research, Vaughan makes a very good claim to include more than mere spoken words when transcribing: a corpus-led investigation may throw up results that were not foreseen. Though it can be said that the author is sometimes speculative about perceived speaker's intend, she provides a clear, insightful argument.
Determining discourse-based moves in professional reports (Flowerdew) Flowerdew starts her article by quoting a negative claim: ''Corpus linguistic techniques have been criticized for encouraging a more bottom-up rather than top-down processing of text.''
I agree that 2000-word samples of text used to create a corpus will narrow the scope of what can be found, it was for reasons of copyright (still unresolved) and computing power (resolved) that such decisions were made.
Flowerdew looks here at Problem-Solution collocation behavior in a specialist (environmental recommendation reports) corpus.
However, what she does is mostly pointless: she insists on looking at the word PROBLEM/S even though, in both forms, the word appears only 1.5 times on average per report in the 60 reports in her study. Though she then moves on to the high-frequency word IMPACT/S, she finds that the word it appears more often in the Body rather than in the Introduction or Conclusion of the text. It would have been helpful if she would have looked at the word total of each of these three parts and then compared the percentage of occurrence of IMPACT/S.
The discourse intonation patterns of word associations (Cheng & Warren) This most interesting article by Cheng and Warren looks in how far word associations and intonation patterns work in tandem. Starting with John Sinclair's (2004) premise that ''the word is not the best starting point for a description of meaning, because meaning arises from words in particular combinations the authors created a 1 million word Hong Kong Corpus of Spoken English'' which is prosodically transcribed. This transcription is based on Brazil's (1985, 1997) discourse intonation system. While failing to say that J.R. Firth (1957) already mentioned the phenomenon of phonetic prosody, they base their work strongly on Brazil (1995, 1997). Brazil seeks ''an integration of phonological patterns, in particular tone unit boundaries and prominence, with grammar'' (p.138). This also links in with work done by Sinclair & Mauranen (2005).
One of the findings the authors present is that only three of the ten lexically-rich word associations have a 100% occurrence in a single tone unit (p.142). They also find that a tone-unit changes its intonation pattern between early and accepted use: ''...this early stage in the usage of _asia's world city_ is captured in the intonation pattern across two or three tone units. In this pattern, speakers isolate each word, and so better convey the target message (...) At a later stage in the usage of _asia's world city_ when _asia's world city_ is no longer a far-off goal but rather the stated reality, the pattern found is for it to be spoken in one tone unit'' (p.143).
Cheng and Warren describe how lexically-rich as well as grammatically-rich units show that the distribution of prominence makes it possible to identify a pattern of intonation. The authors conclude that ''this study represents a first attempt at examining the relationship between the phraseological characteristics of language the role of discourse intonation'' (p.149).
The authors are always careful not to state anything without giving a caveat (for example that prominent patterns are never fixed) which is to their great credit. It must be hoped that this well-written and exciting paper sparks a whole lot of research into this area.
Evidentiality in US newspapers during the 2004 presidential campaign (Garretson & Ädel) Garretson and Ädel look at eleven US newspapers for ''hearsay evidentiality'' to see if corpus research can uncover evidence of the bias media is alleged to have. To do so, they focused on ''reporting verbs (say, tell etc.); reporting nouns (i.e. criticism) or prepositional phrases (i.e. according to).'' In a very solid, carefully structured article, the authors achieve a neat flow in their argument. Garretson and Ädel highlight the different possible sources that can be encountered in newspaper reports and provide a scale from liberal to conservative (see page 175). Discussing how hearsay can be verbalized (and the authors describe languages where the source of information is more clearly specified than in English), they describe how English gives a clue via direct as opposed to indirect reported speech.
Across the papers, they find that the ''overall balance between direct and indirect speech... (is about) 40% vs. 60%'' (p.169). They point out that this, however, only describes the writing style usually found in US papers. When looking at the percentages of the sources (of the two opposing camps) quoted, the authors find that ''the results show no difference whatsoever (...) sources are treated exactly the same in terms of how often they are cited verbatim'' (p.172).
In a subcorpus, the _Boston Globe_ (Kerry's home paper) and the _Houston Chronicle_ (Bush's home) are compared with the _Cleveland Plain Dealer_ and _USA Today_. While the home papers give more space to their candidates, _Plain Dealer_ (who supported Bush 2000) gives slightly more space to Kerry and only _USA Today_ gives totally equal space to both candidates (see pp. 174, 177). The authors reckon that the _Plain Dealer_ gives three times as much space to special interest groups because Ohio was seen as a major battleground.
While being careful not to overinterpret things, Garretson and Ädel point out that there may have been more subtle techniques at play to create a picture of each candidate in the respective reader's minds. These issues would be harder to find using corpus linguistic methods alone. Perhaps more importantly though, it is said that newspapers can no longer been seen as the major opinion formers. Criticism of bias may have neutered them (and Garretson / Ädel hint that this is the case). Yet, at the same time, the sources available – online, on radio or cable network news – are less-well controlled and can give biased, misleading and not always truthful information. These now have to be seen as the major opinion formers.
Television dialogue and natural conversation (Quaglio) Paulo Quaglio's piece gives important evidence for every ESL teacher who wants to use naturally occurring speech to tutor their students in conversation skills. Spoken corpora are authentic and great – but hard to come by. So what about using TV plays that mimic conversational English? What about the fact that some nerds transcribed nine seasons of the show and made them - totally free of charge – available on the www? Good news really for any corpus linguist, but is it truly useful when used to teach? The Friends corpus is compared to the AE conversation subcorpus of the Longman Grammar Corpus. Qualigo relies strongly on Biber's multidimensional methodology (Biber 1988) and his functional analysis tools (Biber et al 1999). Given the space constraints, he focuses on vagueness, emotional language, emotional intensifiers (_so_, _really_ , and _totally_) and the use of expletives.
Qualigo structures his text well, and the generous use of figures and tables relay the most important findings in a very clear way. Consequently, we can see the differences: of the 13 listed features associated with vague language, only three (some discourse and stance markers; copular verbs) appear in both corpora. Most appear, as can be expected, in the Longman corpus. This is different for emotional language features. While some intensifiers appear in both, far more features appear in the Friends transcript. Looking at intensifiers, there are differences, making the Friends language appear maybe more emotional. Qualigo believes that restrictions on the terminology that can be broadcast probably lead to the discrepancies found in the use of expletives.
To sum up, Qualigo believes that most differences in the language of the respective corpora are down to situation-specific circumstances. As far as _Friends_ as a source to teach face-to-face conversation in ESL is concerned, however, ''it is a fairly accurate representation'' (p.209).
A corpus approach to discursive constructions of a hip-hop identity (Kirsty Beers Faegersten - KBF)
Pointing out that ''in cyberspace you are what you type'', KBF looks at openings and closings, repeated use of slang and taboo terms and evidence of verbal art. All this is taken from named hip-hop message boards.
Slang use is part of identity building, and online there are few inhibitions not to use them but many reasons to create a self: ''The use of slang in the (...) postings reflects a familiarity with both linguistic and non-linguistic or cultural hip-hop practices, helping to identify each contributor as an in-group (...) member'' (p.223).
Finding that the use of taboo terms appears in every single posting, KBF finds it a salient feature. Postings appear to be, she points out, written representation of spoken discourse: ''Although (...) composed of written English, it can be argued that the content reveals features of spoken, conversational English'' (p.225). In the sample corpus, we find very frequent use of YOU for a written corpus. Writers also quite often start without an introduction and have a community-related way to sign off: ''peace''.
What KBF terms verbal art is seen as characteristic for these types of text. This can be simply the use of numerals and special characters to avoid the filter online providers use to keep out the ''wrong'' language. Or ''U'' for ''you'' when a contribution is more aggressive. KBF describes this again as identity building. Yet I wonder why she does not make the obvious connection to a related written form – mobile SMS use.
KBF claims that while many studies have looked at the content of hip-hop culture, she, however, focused on the form. Her use of keyword and word frequency analysis certainly revealed that corpus linguistic methods can be applied extremely well for stylistic analysis of this kind and her well-rounded article is clearly an important contribution despite the omission of some obvious-seeming references.
Initially, I was weary of the subject discussed by KBF. 100.000 words is a small corpus and looking at hip-hop websites appeared like trying too hard to look at the latest trend. Yet the strongest criticism I can raise is that KBF does not even mention William Labov – though her subject matter very much looks like a modern version of the his research into the black urban vernacular. Indeed, some of Labov's approaches could have been used for this material (cf. Labov 1973).
The use of the it-cleft construction in 19th-century English (Johansson) Corpus Linguistics can also be used to look at shifts in language use, as Johansson's work on it-cleft proves. She focuses on 19th century texts (mainly court-room transcripts) and makes a comparison with present day English use.
Giving a historical overview on the feature, the author describes that Early Modern English Trial texts were used because they are closest to ''spoken'' use, as they allow a voice to those (usually witnesses) that are otherwise not represented (i.e. maids). In her findings, Johansson describes how it-clefts were used least in 19th century fiction but seemed to have been a feature of normal spoken use. She confirms that ''19th –century it-clefts seem to be more complex structurally and informationally than present English examples'' (p.264).
This article is less accessible than the others, probably because it is very densely packed with data. I also found that there are very many assumptions, yet too few caveats: after all, there is no real 19th century spoken corpus and the data available is tiny. I also missed any reference to Dawn Archer's research, though she has worked in this particular field for many years now (cf. Archer 2005).
Place and time adverbials in native and non-native English student writing (Crawford) William J. Crawford provides a solid, well-constructed and interesting argument in this article. Where previous research (and he gives an excellent literature overview) highlighted the spoken nature of the academic texts written by learner writers, Crawford investigates in how far a difference exists between L1 and L2 learner writers. Crawford looks at Germanic, Romance and Slavic L2 learner writers to compare them with L1 learner writers and academic texts.
Looking at _here_ and _there_; and _now_ and _then_, the author picks markers of casual spoken language use. He does not say why these particular ones have been chosen however. He states ''an important issue that (previous) studies rarely address is the possibility that learners are using a high frequency of a given lexical item that is similar to conversation but are employing the functions associated with academic writing but not with conversation'' (p.271).
The first conclusion he comes to is that ''... this comparison illustrates no overall pattern of L1 – L2 difference'' (p.279). Indeed, the difference is between learner of academic writing and the establish academic writer. A conclusion that is in tune with Michael Hoey's (2005) theory of lexical priming which states that we need to be newly primed for each new situation (in this case, EAP writing rather than conversational English writing).
Crawford concludes with some important insight for all teachers: ''1)experience in writing will lead to decreased use of the features associated with spoken language; and 2) functional differences should be expressly taught.''
EVALUATION In the late nineties corpus-linguistic methods were marginal. This book shows that that is no longer the case. Yet, some criticism: the references and ''last accessed'' dates indicate that a number of contributions were written in 2004/05. For the keen researcher earlier publication as a single article would have been welcome. At the same time, the volume gives a great overview of the current state of knowledge made (mostly) easily accessible even to the non-specialist linguist.
As I pointed out at the beginning, this is a well-rounded work with outstanding contributions. Library budgets may be depressed, but I strongly recommend this book to be added to student resources. Even interested undergraduates will find it useful and everybody who is either interested in Discourse Analysis or Corpus Linguistics will find it a valuable resource. It can be made more valuable though: I sorely missed a brief description of the authors and their research interests.
REFERENCES Archer, Dawn. (2005) _Historical Sociopragmatics: Questions and Answers in the English Courtroom (1640-1760)_. Pragmatics and Beyond New Series. Amsterdam/Philadelphia: John Benjamins.
Biber, Douglas. (1988) _Variation across speech and writing_. Cambridge: CUP.
Biber, D; Johansson, S; Leech, G; Conrad, S; Finegan, E. (1999) _Longman Grammar of Spoken and Written English_. London: Longman.
Brazil, David. (1985) _The Communicative Value of Intonation_. Birmingham: English Language Research.
Brazil, David. (1995) _A Grammar of Speech_. Oxford: OUP.
Brazil, David. (1997) _The Communicative value of Intonation in English_. Cambridge: CUP.
Firth, J.R. (1957) _Papers in Linguistics 1934-1951_. London: Oxford University Press.
Hoey, Michael. (2005) _Lexical Priming: A New Theory of Words and Language_. London: Routledge.
Labov, William (1973) _Language in the Inner City: Studies in the Black English Vernacular_. Pennsylvania: University of Pennsylvania Press.
Leech, Geoffrey. (1991) The state of the art in corpus linguistics. In _English Corpus Linguistics_. K. Aijmer & Altenberg (eds), 8-29. London: Longman.
Mukherjee, J. (2004) The state of the art in corpus linguistics: Three book-length perspectives. _English Language and Linguistics_ 8 (1): 103-119.
Quirk, R; Greenbaum, R, Leech, G; Svartvik, J (eds). (1985) _A comprehensive Grammar of the English Language_. London: Longman.
Sinclair, John. (2004) _Trust the Text_. London: Routledge.
ABOUT THE REVIEWER Michael TL Pace-Sigge is University Teacher in the School of English at the University of Liverpool. His research interest mainly lies with corpus linguistics and spoken language research. After completing his MA on the lenition in Liverpool English stop consonants, using spectrography as sound representation, he moved on to do his PhD on the use of lexis in Liverpool English (due for completion in 2009). He is particularly interested in Michael Hoey's theory of Lexical Priming and evidence of priming does form a center part of his thesis. His other main area of interest is phonology and particularly in how far David Brazil's work on the discourse intonation system can be applied in describing language-in-use.
|