Review of  Corpora and Discourse

Reviewer: Michael Thomas Pace-Sigge
Book Title: Corpora and Discourse
Book Author: Annelie Ädel Randi Reppen
Publisher: John Benjamins
Linguistic Field(s): Discourse Analysis
Text/Corpus Linguistics
Subject Language(s): English
EDITORS: Ädel, Annelie; Reppen, Randi
TITLE: Corpora and Discourse
SUBTITLE: The challenges of different settings
SERIES TITLE: Studies in Corpus Linguistics 31
PUBLISHER: John Benjamins
YEAR: 2008

Michael Pace-Sigge, School of English, University of Liverpool, UK

This book brings together contributions from a diverse collection of scholars
who explore different ways of combining corpus linguistics and discourse
analysis, studying discourse at the prosodic, lexical, and textual levels. Both
spoken and written discourse are investigated in a variety of settings,
including academia, the workplace, news, and entertainment. Not only does the
volume offer a rich sample of English language discourse from around the world,
including international, learner, and non-standard varieties of English, but it
also covers a range of topics and methods. This book will be of particular
interest to researchers and students specializing in discourse studies, English
linguistics, and corpus linguistics.

This book is a solid piece of work, a great resource and a valuable addition to
the growing literature in the area of Corpus Linguistics (CL). How CL has gained
in importance is described at the start of the book: ''Corpus linguistics has,
over the past few decades, undergone a transformation from a 'little donkey
cart' to a 'bandwagon' (Leech 1991) and is now (...) 'becoming part of
mainstream linguistics' (Mukherjee 2004)'' (p.1).

Indeed, the editors point out that ''now'' computing power opens up areas of
research that still remained closed at the turn of the century. Consequently,
more and more linguistic fields make use of real occurring language data. One of
these is Discourse Studies: ''Discourse phenomena, with their frequent dependence
on and sensitivity to context, co-text and interpretation, require rather
complex solutions'' (p. 1).

Ädel and Reppen divide the book into four sections: ''Exploring discourse in
academic settings''; ''Exploring discourse in workplace settings''; ''Exploring
discourse in news and entertainment''; and ''Exploring discourse through specific
linguistic features''. This makes for coherent reading even when somebody would
decide to read the whole collection from beginning to end. The editors must be
praised for making a selection that is well-written throughout and, with one
exception, extremely well-founded research which is presented in a very clear,
logical and accessible way.

Below, I shall give a brief review of each of the contributions. Very few
readers will be interested in all the areas of discourse approached. I am sure
however that many readers will find one or two articles of great interest to them.

''... and so on and so forth''. A comparative analysis of vague category markers
in academic discourse. (Walsh, O'Keefe & McCarthy)
The authors start with the premise that the use of vague language is one of the
most common features of spoken English. The authors consider in how far the
everyday use differs from spoken academic discourse, making use of the Limerick
Belfast Corpus of Academic Spoken English in comparison to the Limerick Corpus
of Spoken English and CANCODE. This reveals, first of all, that the difference
between UK English and Irish English is greater than between casual spoken and
academic spoken use in the Irish corpora. The authors see several modes: the
managerial mode used to start an activity; the skills and systems mode, which
delivers an open platform for participation; and the classroom context mode,
which resembles casual conversion. Walsh, O'Keefe & McCarthy show convincingly
that academics use phrases like ''and so on'' when under time pressure and phrases
like ''or anything like that'' to offer students options to respond – a ''softener''
to make conversation flow, particularly in tutorials.

All this is solidly researched and the background reading is well delivered
without taking up too much space.

Emphatics in academic discourse (Bondi)
Bondi looks at stance markers like _actually_, _definitely_, _apparently_, etc.
in history and economics journal articles. Relying very much on Quirk et al
(1985) she describes how emphasizers may take scope over the whole predicate or
the whole sentence while intensifiers do not. Comparing the keywords in a
history journal corpus with the keywords in a economics journal corpus she finds
that _significantly_, _positively_, _substantially_, etc. are keywords in
economics texts while _certainly_, _especially_, _particularly_, etc are
keywords in history texts. Bondi finds that the variety of adverbs is larger in
history and ''that economics tends to place emphasis on a simplification of
reality based on a process of abstraction ('typically') and on statistics
('significantly') whereas history places emphasis on frequency and accumulation
of factual data ('usually, largely...')'' (p.39). Highlighting that
_significantly_ is a significant modifier in economics texts, she elaborates
that _invariably_ is used in ''interestingly different patterns across
disciplines'' (p. 50).

Still. I wondered whether this kind of research has not been done before. The
total numbers supporting Bondi's claims seem to be low. For a number of her
assertions, the literature used appears mostly to be old and while there is a
longish introduction, many things seem to be claims to would need to be backed
up – either by other research or by comparison with occurrence patterns in
another corpus.

Interaction, identity and culture in academic writing (Sanderson)
Sanderson starts with the premise that ''academic writing has traditionally been
conceived as a register lacking personal involvement'' (p.57), a claim she
rejects. In her paper she takes a multidimensional approach to look at evidence
of personal identity within academic writing. She not only compares the age,
gender and employment status of the writers, she also compares German with
British and American academics.

She is looking at academic writing in the Arts, where a reader certainly would
expect more personal involvement of the writer than in Science. While Sanderson
looks at a statistical valid corpus for her research, there remains a feeling
that the paper confirms very much her world-view as outlined in the
introduction: tenured, older, male academics are more casual in their use of
language, while those who are in less secure positions very much conform in
their use of person reference. The cultural differences are made clear in this
study: German writers feel the ''Ich Verbot '' (I taboo) while English-speaking
academics feel free to address the reader directly. Likewise, when they want to
express personal opinion, German academics usually claim group membership. When
Sanderson compares the differences by discipline, it first appears that
Philosophy offers the highest degree of person reference in both languages, yet
context-based comparison reveals that German writers adhere strongly to the ''I
taboo'' in this discipline too.

Analysis of the role of humour in workplace meetings (Vaughan)
On first sight, this seems to be a rather difficult task for a corpus linguist
to do: analyzing humor on the basis of transcripts. However, Vaughan points out
that a great many transcripts used in corpora include the extralingual feature
of laughter. In the meetings recorded (of English speaking teachers in Mexico
and Ireland during school meetings) this is not necessarily an expected feature
either, but revealing where it occurs. Seen as an integral part in spoken
discourse, humor / laughter is shown to have a variety of functions. In her
corpora, Vaughan says that it can be subversive or reinforcing. It can be used
by the general staff (where it is reinforcing solidarity) or by heads of
departments (where it is reinforcing power as well as solidarity). This is made
very accessible to readers in a table on page 105.

A solid piece of research, Vaughan makes a very good claim to include more than
mere spoken words when transcribing: a corpus-led investigation may throw up
results that were not foreseen. Though it can be said that the author is
sometimes speculative about perceived speaker's intend, she provides a clear,
insightful argument.

Determining discourse-based moves in professional reports (Flowerdew)
Flowerdew starts her article by quoting a negative claim: ''Corpus linguistic
techniques have been criticized for encouraging a more bottom-up rather than
top-down processing of text.''

I agree that 2000-word samples of text used to create a corpus will narrow the
scope of what can be found, it was for reasons of copyright (still unresolved)
and computing power (resolved) that such decisions were made.

Flowerdew looks here at Problem-Solution collocation behavior in a specialist
(environmental recommendation reports) corpus.

However, what she does is mostly pointless: she insists on looking at the word
PROBLEM/S even though, in both forms, the word appears only 1.5 times on average
per report in the 60 reports in her study. Though she then moves on to the
high-frequency word IMPACT/S, she finds that the word it appears more often in
the Body rather than in the Introduction or Conclusion of the text. It would
have been helpful if she would have looked at the word total of each of these
three parts and then compared the percentage of occurrence of IMPACT/S.

The discourse intonation patterns of word associations (Cheng & Warren)
This most interesting article by Cheng and Warren looks in how far word
associations and intonation patterns work in tandem. Starting with John
Sinclair's (2004) premise that ''the word is not the best starting point for a
description of meaning, because meaning arises from words in particular
combinations the authors created a 1 million word Hong Kong Corpus of Spoken
English'' which is prosodically transcribed. This transcription is based on
Brazil's (1985, 1997) discourse intonation system. While failing to say that
J.R. Firth (1957) already mentioned the phenomenon of phonetic prosody, they
base their work strongly on Brazil (1995, 1997). Brazil seeks ''an integration of
phonological patterns, in particular tone unit boundaries and prominence, with
grammar'' (p.138). This also links in with work done by Sinclair & Mauranen (2005).

One of the findings the authors present is that only three of the ten
lexically-rich word associations have a 100% occurrence in a single tone unit
(p.142). They also find that a tone-unit changes its intonation pattern between
early and accepted use: ''...this early stage in the usage of _asia's world city_
is captured in the intonation pattern across two or three tone units. In this
pattern, speakers isolate each word, and so better convey the target message
(...) At a later stage in the usage of _asia's world city_ when _asia's world
city_ is no longer a far-off goal but rather the stated reality, the pattern
found is for it to be spoken in one tone unit'' (p.143).

Cheng and Warren describe how lexically-rich as well as grammatically-rich units
show that the distribution of prominence makes it possible to identify a pattern
of intonation. The authors conclude that ''this study represents a first attempt
at examining the relationship between the phraseological characteristics of
language the role of discourse intonation'' (p.149).

The authors are always careful not to state anything without giving a caveat
(for example that prominent patterns are never fixed) which is to their great
credit. It must be hoped that this well-written and exciting paper sparks a
whole lot of research into this area.

Evidentiality in US newspapers during the 2004 presidential campaign (Garretson
& Ädel)
Garretson and Ädel look at eleven US newspapers for ''hearsay evidentiality'' to
see if corpus research can uncover evidence of the bias media is alleged to
have. To do so, they focused on ''reporting verbs (say, tell etc.); reporting
nouns (i.e. criticism) or prepositional phrases (i.e. according to).'' In a very
solid, carefully structured article, the authors achieve a neat flow in their
argument. Garretson and Ädel highlight the different possible sources that can
be encountered in newspaper reports and provide a scale from liberal to
conservative (see page 175). Discussing how hearsay can be verbalized (and the
authors describe languages where the source of information is more clearly
specified than in English), they describe how English gives a clue via direct as
opposed to indirect reported speech.

Across the papers, they find that the ''overall balance between direct and
indirect speech... (is about) 40% vs. 60%'' (p.169). They point out that this,
however, only describes the writing style usually found in US papers. When
looking at the percentages of the sources (of the two opposing camps) quoted,
the authors find that ''the results show no difference whatsoever (...) sources
are treated exactly the same in terms of how often they are cited verbatim'' (p.172).

In a subcorpus, the _Boston Globe_ (Kerry's home paper) and the _Houston
Chronicle_ (Bush's home) are compared with the _Cleveland Plain Dealer_ and _USA
Today_. While the home papers give more space to their candidates, _Plain
Dealer_ (who supported Bush 2000) gives slightly more space to Kerry and only
_USA Today_ gives totally equal space to both candidates (see pp. 174, 177). The
authors reckon that the _Plain Dealer_ gives three times as much space to
special interest groups because Ohio was seen as a major battleground.

While being careful not to overinterpret things, Garretson and Ädel point out
that there may have been more subtle techniques at play to create a picture of
each candidate in the respective reader's minds. These issues would be harder to
find using corpus linguistic methods alone. Perhaps more importantly though, it
is said that newspapers can no longer been seen as the major opinion formers.
Criticism of bias may have neutered them (and Garretson / Ädel hint that this is
the case). Yet, at the same time, the sources available – online, on radio or
cable network news – are less-well controlled and can give biased, misleading
and not always truthful information. These now have to be seen as the major
opinion formers.

Television dialogue and natural conversation (Quaglio)
Paulo Quaglio's piece gives important evidence for every ESL teacher who wants
to use naturally occurring speech to tutor their students in conversation
skills. Spoken corpora are authentic and great – but hard to come by. So what
about using TV plays that mimic conversational English? What about the fact that
some nerds transcribed nine seasons of the show and made them - totally free of
charge – available on the www? Good news really for any corpus linguist, but is
it truly useful when used to teach? The Friends corpus is compared to the AE
conversation subcorpus of the Longman Grammar Corpus. Qualigo relies strongly on
Biber's multidimensional methodology (Biber 1988) and his functional analysis
tools (Biber et al 1999). Given the space constraints, he focuses on vagueness,
emotional language, emotional intensifiers (_so_, _really_ , and _totally_) and
the use of expletives.

Qualigo structures his text well, and the generous use of figures and tables
relay the most important findings in a very clear way. Consequently, we can see
the differences: of the 13 listed features associated with vague language, only
three (some discourse and stance markers; copular verbs) appear in both corpora.
Most appear, as can be expected, in the Longman corpus. This is different for
emotional language features. While some intensifiers appear in both, far more
features appear in the Friends transcript. Looking at intensifiers, there are
differences, making the Friends language appear maybe more emotional. Qualigo
believes that restrictions on the terminology that can be broadcast probably
lead to the discrepancies found in the use of expletives.

To sum up, Qualigo believes that most differences in the language of the
respective corpora are down to situation-specific circumstances. As far as
_Friends_ as a source to teach face-to-face conversation in ESL is concerned,
however, ''it is a fairly accurate representation'' (p.209).

A corpus approach to discursive constructions of a hip-hop identity (Kirsty
Beers Faegersten - KBF)

Pointing out that ''in cyberspace you are what you type'', KBF looks at openings
and closings, repeated use of slang and taboo terms and evidence of verbal art.
All this is taken from named hip-hop message boards.

Slang use is part of identity building, and online there are few inhibitions not
to use them but many reasons to create a self: ''The use of slang in the (...)
postings reflects a familiarity with both linguistic and non-linguistic or
cultural hip-hop practices, helping to identify each contributor as an in-group
(...) member'' (p.223).

Finding that the use of taboo terms appears in every single posting, KBF finds
it a salient feature. Postings appear to be, she points out, written
representation of spoken discourse: ''Although (...) composed of written English,
it can be argued that the content reveals features of spoken, conversational
English'' (p.225). In the sample corpus, we find very frequent use of YOU for a
written corpus. Writers also quite often start without an introduction and have
a community-related way to sign off: ''peace''.

What KBF terms verbal art is seen as characteristic for these types of text.
This can be simply the use of numerals and special characters to avoid the
filter online providers use to keep out the ''wrong'' language. Or ''U'' for ''you''
when a contribution is more aggressive. KBF describes this again as identity
building. Yet I wonder why she does not make the obvious connection to a related
written form – mobile SMS use.

KBF claims that while many studies have looked at the content of hip-hop
culture, she, however, focused on the form. Her use of keyword and word
frequency analysis certainly revealed that corpus linguistic methods can be
applied extremely well for stylistic analysis of this kind and her well-rounded
article is clearly an important contribution despite the omission of some
obvious-seeming references.

Initially, I was weary of the subject discussed by KBF. 100.000 words is a small
corpus and looking at hip-hop websites appeared like trying too hard to look at
the latest trend. Yet the strongest criticism I can raise is that KBF does not
even mention William Labov – though her subject matter very much looks like a
modern version of the his research into the black urban vernacular. Indeed, some
of Labov's approaches could have been used for this material (cf. Labov 1973).

The use of the it-cleft construction in 19th-century English (Johansson)
Corpus Linguistics can also be used to look at shifts in language use, as
Johansson's work on it-cleft proves. She focuses on 19th century texts (mainly
court-room transcripts) and makes a comparison with present day English use.

Giving a historical overview on the feature, the author describes that Early
Modern English Trial texts were used because they are closest to ''spoken'' use,
as they allow a voice to those (usually witnesses) that are otherwise not
represented (i.e. maids). In her findings, Johansson describes how it-clefts
were used least in 19th century fiction but seemed to have been a feature of
normal spoken use. She confirms that ''19th –century it-clefts seem to be more
complex structurally and informationally than present English examples'' (p.264).

This article is less accessible than the others, probably because it is very
densely packed with data. I also found that there are very many assumptions, yet
too few caveats: after all, there is no real 19th century spoken corpus and the
data available is tiny. I also missed any reference to Dawn Archer's research,
though she has worked in this particular field for many years now (cf. Archer 2005).

Place and time adverbials in native and non-native English student writing
William J. Crawford provides a solid, well-constructed and interesting argument
in this article. Where previous research (and he gives an excellent literature
overview) highlighted the spoken nature of the academic texts written by learner
writers, Crawford investigates in how far a difference exists between L1 and L2
learner writers. Crawford looks at Germanic, Romance and Slavic L2 learner
writers to compare them with L1 learner writers and academic texts.

Looking at _here_ and _there_; and _now_ and _then_, the author picks markers of
casual spoken language use. He does not say why these particular ones have been
chosen however. He states ''an important issue that (previous) studies rarely
address is the possibility that learners are using a high frequency of a given
lexical item that is similar to conversation but are employing the functions
associated with academic writing but not with conversation'' (p.271).

The first conclusion he comes to is that ''... this comparison illustrates no
overall pattern of L1 – L2 difference'' (p.279). Indeed, the difference is
between learner of academic writing and the establish academic writer. A
conclusion that is in tune with Michael Hoey's (2005) theory of lexical priming
which states that we need to be newly primed for each new situation (in this
case, EAP writing rather than conversational English writing).

Crawford concludes with some important insight for all teachers: ''1)experience
in writing will lead to decreased use of the features associated with spoken
language; and 2) functional differences should be expressly taught.''

In the late nineties corpus-linguistic methods were marginal. This book shows
that that is no longer the case. Yet, some criticism: the references and ''last
accessed'' dates indicate that a number of contributions were written in 2004/05.
For the keen researcher earlier publication as a single article would have been
welcome. At the same time, the volume gives a great overview of the current
state of knowledge made (mostly) easily accessible even to the non-specialist

As I pointed out at the beginning, this is a well-rounded work with outstanding
contributions. Library budgets may be depressed, but I strongly recommend this
book to be added to student resources. Even interested undergraduates will find
it useful and everybody who is either interested in Discourse Analysis or Corpus
Linguistics will find it a valuable resource. It can be made more valuable
though: I sorely missed a brief description of the authors and their research

Michael TL Pace-Sigge is University Teacher in the School of English at the
University of Liverpool. His research interest mainly lies with corpus
linguistics and spoken language research. After completing his MA on the
lenition in Liverpool English stop consonants, using spectrography as sound
representation, he moved on to do his PhD on the use of lexis in Liverpool
English (due for completion in 2009). He is particularly interested in Michael
Hoey's theory of Lexical Priming and evidence of priming does form a center part
of his thesis. His other main area of interest is phonology and particularly in
how far David Brazil's work on the discourse intonation system can be applied in
describing language-in-use.