Review of  Corpus and Context

Reviewer: Michael Thomas Pace-Sigge
Book Title: Corpus and Context
Book Author: Svenja Adolphs
Publisher: John Benjamins
Linguistic Field(s): Discourse Analysis
Text/Corpus Linguistics
Issue Number: 19.3846

AUTHOR: Adolphs, Svenja
TITLE: Corpus and Context
SUBTITLE: Investigating pragmatic functions in spoken discourse
SERIES TITLE: Studies in Corpus Linguistics 30
PUBLISHER: John Benjamins
YEAR: 2008

Michael Pace-Sigge, School of English, University of Liverpool, UK

This book explores the relationship between corpus linguistics and pragmatics by
discussing possible frameworks for analyzing utterance function on the basis of
spoken corpora. The book articulates the challenges and opportunities associated
with a change of focus in corpus research, from lexical to functional units,
from concordance lines to extended stretches of discourse, and from the purely
textual to multi-modal analysis of spoken corpus data. Drawing on a number of
spoken corpora including the five million word Cambridge and Nottingham Corpus
of Discourse in English (CANCODE, funded by CUP (c)), a specific speech act
function is explored using different approaches and different levels of
analysis. This involves a close analysis of contextual variables in relation to
lexico-grammatical and discoursal patterns that emerge from the corpus data, as
well as a wider discussion of the role of context in spoken corpus research.

Svenja Adolphs' book comes in two unequal parts. The first part, covering
chapters one to five and chapter seven is based on her PhD thesis from 2001,
chapter six and the last paragraph of chapter seven are based on her most recent
research, concerned with linking audio-visual data with written data as part of
her investigation of pragmatic functions.

This is a slim volume (it has 150 pages in total) clearly aimed at the
postgraduate and academic reader with sufficient knowledge of both Speech Act
theory and Corpus Linguistics.

For the main part, Adolphs clearly displays that she has read widely to discuss
the issues her PhD thesis are concerned with and her bibliography up to the year
2000 is certainly of value to anyone working in this area. This book has the
stated aim to ''explore further the relationship between corpus linguistics and
pragmatics by developing a new approach to the analysis of utterance function
that is based on corpus data''(p.16). This is a valuable contribution. Speech Act
Theory, the main pillar for the study of pragmatics, had no computer-based
corpus research available. Consequently, the theory remained largely based on
made-up examples. Adolphs highlights the limitations that the study of
pragmatics without the use of attested language-in-use has. Referring to the
work of Michael Stubbs (1996) she discusses the benefits close corpus analysis
would bring to the field.

Indeed, the use of spoken language in linguistic research remains
under-represented. In Corpus Linguistics (CL), analysis of written texts now
works with corpora that can contain billions of words. Adolphs describes in some
detail the difficulties researchers face when it comes to corpora of spoken
text. The main restraints are the funds available to collect data and
transcribe, the mode of transcription chosen that already filters the original
data and, subsequently, the small size and limited frequencies presented by all
available data in comparison to written corpora.

Chapters one and two serve as an introduction to the book: Chapter one is a
general overview and chapter two is concerned with (spoken) corpus construction,
in particular, the construction of the Cambridge and Nottingham Corpus of
Discourse in English – CANCODE.

The book comes into its own in chapter three, when some real research is
presented. Adolphs discusses at length one ''particular speech act, that of
making suggestions, to illustrate how a close analysis of corpus data can inform
pragmatic theories and methodologies'' (p. 43).

Adolphs firmly couches her research here in the work done on suggestions in both
Pragmatics and Discourse Analysis (DA), initially looking at the use of
multi-word expressions introduced with ''let's''. This leads her to assert that
CANCODE data suggests ''that the main function of the expression is a structuring
one, rather than a speech act function'' (p.46). She also makes clear that
expressions using the word ''suggest'' are not the most frequent way to make an
actual suggestion. The most frequent form of suggestion is given by ''why'' with a
negative. Indeed, CANCODE has 182 instances of the expression ''why don't you''
and ''the corpus shows that this expression is predominantly used to put forward
a suggestion'' (p.61). Subsequently, both the spoken occurrences of ''suggest'' and
''why don't you'' are discussed in conjunction with the collocates that have the
highest C-Score (i.e. the most frequent collocates).

Adolphs, in Chapter four, builds on these findings, pointing out that corpus
data can give a more specific and detailed description of use than intuition.
Chapter four is titled ''Pragmatic functions in context'' and Adolphs describes
which expressions can be seen as more or less polite in which context.
(''Suggest'', for example, occurs most frequently by a wide margin in a
''pedagogic'' environment). The author finds that ''it seems that the corpus data
used for this analysis points to differences between the two speech act
expressions and their relation to genre (...). 'Why not', for example, is mostly
used to address a wider issue (...), with the aim of complaining or lamenting.
'How about', on the other hand, is used in suggestions towards an identified
problem(...)'' (pp.87f.).

Chapter five focuses on the same word clusters or expressions, ''considering the
scope of using a spoken corpus to develop a discourse-based description of
pragmatic functions'' (p.89). Again with strong reference to research in
Discourse Analysis, Adolphs discusses which expressions, based on the CANCODE
evidence, are employed for solving problems that are anchored in the past and
which expressions to solve future problems. This brings about an important find:

''As regards the status of suggestions being either 'solicited' or 'unsolicited',
it was argued that we need to establish a more fine-grained definition of these
seemingly polar categories in the light of the diversity of acts that precede
the suggestion'' (p.116). Her research indicates that speakers tend to try and
converge and seek agreement in most discourse examples.

As I said earlier, chapter six stands apart as it deals with the use of
non-transcribed text as part of corpus analysis. Adolphs compares here the
systemic-functional linguistic tradition (mainly focused on written texts) and
Conversation Analysis (mainly focused on transcripts) with Corpus Linguistic
approaches which, according to her ''seek more quantitative insights, are more
concerned with regular, frequent and thus generalisable patterns of meaning''
(p.120). Looking at the work done on pauses in speech (with an audio stream in
parallel to the transcript), the author argues how head-nods and backchannels
form an important part of communication patterns and how audio-visual streams in
concert with the transcript can offer important new insights: ''... the
accompanying head movement, as well as the intonation pattern, can change the
function of the backchannel realisation, which in turn affects the surrounding
discourse'' (p. 127).

Chapter seven serves as a summary.

There is one major criticism to this book. Neither the author nor the publisher
appears to have given the manuscript to a knowledgeable proof-reader / editor
before sending it off to the printers. This can be amusingly embarrassing when,
on page 49, the linguist author mistypes ''speech'' as ''speach''.

This can bore a reader when there is too much repetition in the beginning: by
page 23 I had read four times that corpus linguistics is well placed to address
a speech act theory issue like ''indirectness''.

This appears careless when an extract on page 84 is referred to as ''a situation
regarding working regulations at an electricity supplier'' when a longer version
of the same excerpt on page 94 clearly infers it is a discussion regarding a
water supplier's notice to cut off domestic supply. Similarly careless appears
the claim in the summary of chapter two: ''it has been argued that a description
of speech act expressions in terms of their unit meaning which includes patterns
of collocation, colligation...'' (p.42) when chapter two did no such thing - the
first mention of collocation appears on page 53.

It is certainly highly distracting when almost all references are structured
like this: ''(see Buehler 1934; Jakobson 1960; Malinowski 1923; Firth 1957 and
Halliday 1973, 1978)'' (p.22). A reader, rather than following her argument, will
be sidetracked to figure out the guiding principle behind this, as it is clearly
not alphabetical or chronological.
All of this could be seen as purely cosmetic flaws. However, they can sow the
seeds of doubt whether the author took enough care when analyzing her data.

Another major drawback is the reliance on the original PhD text (apart from the
''newer'' bit in chapter six), where later research brought into the argument
seems to be bolted on, for example, references to Michael Hoey's (2005) and
Alison Wray's (2002) work. These appear in the text, yet their insights which
could have a profound impact on the book's argument are not visible. At the same
time, for a book that discusses lexico-grammatical patterns in detail, the
omission to include findings from Hunston & Francis (2000) and Partington (1998)
is unhelpful.

The book's stated aim is the corpus-based investigation of pragmatic functions.
In a short book like this, I would have expected more research-based results –
in particular as it appears that Adolphs was involved in designing the CANCODE
corpus. (The book does not say; indeed, there is very little information on
Svenja Adolphs beyond the fact that she teaches at Nottingham). Instead, the
strong reliance on discussing earlier, mainly non-corpus based research takes
the focus off the main research results. This appears to have crucially
distracted the author to the point where the proverbial wood cannot be seen for
the trees. Adolphs, correctly, states that all her findings can only be the
basis for further research as they are based on a relatively small (5 million
words) spoken sample. This is true, but in her caution she undermines her own
argument. Adolphs could, for example, have checked the ''expressions'' she
discusses in this book in the concordances of other spoken corpora in order to
see whether the patterns she highlights are repeated there. She could also have
made more out of her existing data by not just sticking to her corpus-based
top-down approach but by using the corpus-led bottom-up approach as well. It is
astonishing to find that the exchanges in chapter five using ''why not''; ''why
don't you'' and ''why don't we'' are not employed for this purpose. Otherwise, the
author could have described the pattern her examples make apparent – namely that
''why not'' + infinitive occurs when a speaker brings in a new idea or issue; that
''why don't you'' occurs as a suggestion for a problem routed in the past and ''why
don't we'' occurs as a suggestion for something to be done next / in the future.

This book presents an important contribution in showing how traditional
approaches to language research can be refined by corpus-based insights. Yet the
way it is written and not edited appears to undermine or lessen the impact of
its main findings. Chapter six (inclusion of audio-visual data on top of written
transcripts) in this book therefore appears to take the second step before the
first – sufficient analysis of spoken language data.

Adolphs, S. 2001. _Linking Lexico-grammar and Speech Acts: A Corpus-based
Approach_. PhD dissertation, The University of Nottingham.

Hoey, M. 2005. _Lexical Priming. A new theory of words and language_. London:

Hunston, S. & Francis, G. 2000. _Pattern grammar_. Amsterdam: John Benjamins.

Partington, A. 1998. _Patterns and Meanings_. Amsterdam: John Benjamins.

Stubbs, M. 1996. _Text and Corpus Analysis: Computer-Assisted Studies of
Language and Culture_. Oxford: Blackwell.

Wray, A. 2002. _Formulaic Language and the Lexicon_. Cambridge: Cambridge
University Press.

Michael TL Pace-Sigge is University Teacher in the School of English at the
University of Liverpool. His research interest mainly lies with corpus
linguistics and spoken language research. After completing his MA on the
lenition in Liverpool English stop consonants, using spectrography as sound
representation, he moved on to do his PhD on the use of lexis in Liverpool
English (due for completion in 2009). He is particularly interested in Michael
Hoey's theory of Lexical Priming and evidence of priming does form a center part
of his thesis. His other main area of interest is phonology and particularly in
how far David Brazil's work on the discourse intonation system can be applied in
describing language-in-use.