LINGUIST List 13.3277

Wed Dec 11 2002

Review: Socioling: Breivik and Hasselgren (2002)

Editor for this issue: Naomi Ogasawara <>

What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at


  1. Yuancheng Tu, Breivik and Hasselgren (eds) (2002), From the Colt's Mouth...

Message 1: Breivik and Hasselgren (eds) (2002), From the Colt's Mouth...

Date: Wed, 11 Dec 2002 16:57:38 +0000
From: Yuancheng Tu <>
Subject: Breivik and Hasselgren (eds) (2002), From the Colt's Mouth...

Breivik, Leiv Egil and Angela Hasselgren (2002) From the Colt's Mouth
... And Others'. Rodopi, x+260pp, hardback ISBN 90-420-1479-2, $61.00,
Language and Computers: Studies in Practical Linguistics 40.

Book Announcement on Linguist: 

Yuancheng Tu, 
Department of Linguistics, University of Illinois at Urbana-Champaign

'From the COLT's mouth...and others' is a collection of fifteen
papers, each one exploring a different problem in language corpora
studies. If the reader happens to know that the COLT (The Bergen
Corpus of London Teenage Language) is an English corpus focusing on
the speech of teenagers and Anna-Brita Stenstroem is the person who
compiles it, the title of the book makes more sense and its coherence
with the subtitle: Language Corpora Studies In honour of Anna-Brita
Stenstroem can then be better perceived. The title of this book also
reflects that the research conducted in this collection is more or
less related to spoken corpora such as COLT.

'Does corpus linguistics exist? Some old and new issues' by Jan Aarts,
is the first paper. This paper deals with a number of methodological
questions in corpus linguistics. 'Old issues' here include types,
nature and usage of corpus data. There are two 'new issues'. One is a
heightened interest in the spoken language with the availability of
new electronic resources such as COLT. The other is the distinction
between corpus-based approach and corpus-driven approach
(Tognini-Bonelli 2001). Jan argues that the prominent difference
between corpus-based and corpus-driven is the attitude towards the
annotation of corpus data. In corpus-based approach, annotation is
indispensable while anathematic in the other. The paper concludes that
corpus linguistics does exist and the difference between theoretical
linguistics and corpus linguistics is the object of their study. The
former is concerned about competence while the latter is about
language-in-use, which is first pointed out by Leech (1992).

In 'Zero translations and cross-linguistic equivalence: Evidence from
the English-Swedish Parallel Corpus', Karin Aijmer and Bengt Altenberg
report that cross-linguistic non-equivalence is not the only reason of
omission in translation. They use the English-Swedish Parallel Corpus
to demonstrate that the occurrence of zero translation is governed by
other factors, such as the clarity of the context, language-specific
conventions and even cultural differences. Evidence lies in adverbial
connectors in both English and Swedish, Swedish modal particles,
English discourse particles and translations of endearment words from
Swedish into English. Based on Jan Aarts's criteria, the research
reported in this paper is typically corpus-based, using statistics
from corpus as evidence to demonstrate a theoretical view.

Gisle Andersen's 'Corpora and the double copula' however is a typical
corpus-driven paper. Data from Internet and British National Corpus
exhibit a new sentence structure involving double copula such as _The
best part is, is that you get to shoot your opponent_. Instead of
explaining this double copula as an arbitrary hesitation feature,
Andersen shows that it is actually a new grammatical feature: the
tendency to repeat the copula before a nominal that-clause in the
context of a focus construction. He argues that this double copula
construction is a conflation of two focusing structures, the wh-cleft
and clausal subject postponements of the type 'The
point/issue/question is that'. Since it is not clear if the data from
the Internet represent spoken or written, the double copula structure
may be just a phenomenon in spoken language. However, the author
provides evidence to support that this structure is spreading in
several dimensions, from spoken to written, from American English to
more general English, and from informal to more formal context.

'The non-nominal character of spoken English' by Pieter de Haan seeks
evidence from British National Corpus sampler CD-ROM (one million
words of spoken English and one million words of written English) to
confirm the claim that the written variety of English has a strong
nominal character whereas the spoken variety has a strong verbal, or
clausal character. Therefore, it is typical corpus-based
research. The paper also provides evidence to show the cline from
informal spoken language to informative writing, which has the
strongest nominal character.

The main concern of the next paper is exactly what its title says
'Teenage slang in Norway'. Eli-Marie Drange summarizes some of the
results from a research project survey on Nordic Teenage Language. The
survey shows a new trend that, apart from English, more and more words
come from other languages such as Arabic and Spanish. And many of
these words are in the process of being adjusted to Norwegian spelling
and morphology.

'The semantics and pragmatics of the Norwegian concessive marker
likevel: Evidence from the English-Norwegian Parallel Corpus' by
Thorstein Fretheim and Stig Johansson reminds us of the second paper
in this book ''Zero translations and cross-linguistic equivalence:
Evidence from the English-Swedish Parallel Corpus'' by Karin Aijmer
and Bengt Altenberg. Both of them use Parallel Corpus, examine
language varieties and deal with translation strategies. Fretheim and
Johansson claim that no single form in English parallels the
concessive marker _likevel_ in Norwegian. This lack of formal
counterpart in English triggers the occurrence of translation omission
in going from Norwegian to English. In addition, evidence from
English-Norwegian Parallel Corpus supports the idea that differences
between Norwegian and English are most striking with _likevel_ in
medial and final position where more inferential processing is
required. But these two languages are more alike in regards to local
concessive linking, signaled by initial _likevel_ and English
concessive links of _even so_ type.

'Sound a bit foreign', By Angela Hasselgren, compares the use of small
words, such as _well_, _all right_ and _sort of_ taken from more or
less fluent Norweigian learners of English and native English
speakers. The quality of small-word-usage is evaluated functionally
via the ability to send the signals most essential to
communication. It demonstrates that as the speakers' fluency
increases, they are likely to use more small words and send more basic
signals. However, the real difference exists between the ranges of
small words used by more fluent learners and the native speakers. The
limited range of the fluent learners deprives them from the pragmatic
overtones that native speakers give to their signals and therefore
makes them sound a little foreign.

'Congratulations, like: -Gratulerer, liksom! Proagmatic particles in
English and Norwegian' by Ingrid Kristine Hasund presents the
similarity of the pragmatic particles _like_ in English and _liksom_
in Norwegian. Hasund suggests that these two particles are used in
similar ways to mark the speaker's epistemic stance towards the
content or form of an utterance. The Bergen Corpus of London Teenage
Language (COLT) is the corpus for the English part of the study and a
corpus of spoken Oslo teenage language is used for the Norwegian part
of the study.

'Applicatons of the Stenstroem model of discourse structure' by John
M. Kirk simply applies Stenstroemian model to a variety of
transcribed spoken datasets and focuses on question and response
exchanges by numbering them in each excerpt. Excerpts Kirk uses in
this paper are from London-Lund Corpus, Map Task Corpus, and Dynasty,
an American television soap opera. All of them support the idea that
different types of conversational data or written dramatic dialogues
can be identified and categorized by the Stenstroemian model.

In 'The Britain: An unexpected case of article usage in present-day
English', Goran Kjellmer investigate the variation with regard to
article usage among names of counties such as _the UK_, which
influences the use of the article with Britain. According to Quirk
(1985), names of countries have no article, even with a premodifying
adjective. However, one advertisement for the British Council on the
Internet uses the article _the_ before Britain. Via searching BNC
corpora, Kjellmer found that 'the Britain' actually occurs
repeatedly. The reason for this is summarized as an analogy to the
usage such as _the UK_.

'What vocabulary tells us about genre differences: A study of lexis in
five newspaper genres' by Magnus Ljung is a corpus-based study on
lexical differences. Five newspaper genres were selected: hard news,
sports news, business news, arts articles and obituaries. The data
were taken from the same five weekdays in the CDROM-based 1997 issues
of The Times and The New York Times. The results of this research show
that differences in word use do signal genre differences within
certain textual parameters. Both newspapers have the tendency to be
most formal with general news and least formal with sports.

'What is a grammatical rule?' by Dieter Mindt presents a new
perspective of the definition of grammatical rules. Instead of
description with exceptions, grammatical rules here resemble a
mathematical function, i.e. the exponential function of
decay. Evidence comes from the probability distribution derived from
corpus statistics. Each grammatical rule is represented by a set of
probability distribution of classes, and the class that is lower than
5% is traditionally called exceptions. This distributive
representation of grammatical rules can predict the diachronic change
of language, which cannot be achieved via the traditional definition
of a grammatical rule.

David Minugh investigates the distribution of the formal adposition
_notwithstanding_ in English in 'Her COLTISH energy notwithstanding:
An examination of the adposition nothwithstanding'. This word is
interesting since it can occur prepositionally or
postpositionally. Via statistics from 1845 million words from present
day English and newspaper CDs, he shows that written American English
is most willing to use the postpositional form and the governed NP is
also longer than that of prepositional form.

'As and other relativizers after same in present-day standard English'
by Gunnel Tottie and Hans Martin Lehmann presents the use of _as_ as a
relative marker in constructions where the antecedent contains the
word _same_. Data from BNC-S and The Times show that
same-constructions occur much more frequently with relativizers having
adverbial function and predominantly bearing the manner
type. Pragmatic explanation is provided to account for this
phenomenon, and etymology is used to demonstrate why as is used as a
relativizer after _same_.

Anne Wichmann in her 'looking for attitudes in corpora' looks into the
ways people say things from ICE GB, the British contribution to the
International Corpus of English. She chooses nine word tokens and two
sentence structures as seeds to explore the corpusk, and her
statistics reveal that people do not seem to talk about tone of voice
very much though they intuitively recognize it and response to
it. Anne also presents her categorization of various kinds of meanings
that seem to be encoded in the attitudes of people saying things.

This book is in honor of Anna-Brita. All fifteen papers are directly
or indirectly stamped by something she has done or written on spoken
corpus and discourse analyses. The research conducted in every paper
is more or less related to spoken data except the first one that is
about methodology. However, even in that paper, Jan precisely points
out that a new trend in corpus linguistics is the investigation of
spoken data. This collection provides concrete evidence to show the
contribution of corpus linguistics. Researchers observe new structures
from large corpus, which are beyond linguists' intuition and
introspection, such as the double copula structure reported by Gisle
Andersen. The probabil distribution of a grammatical rule can signal
diachronic change of language that will not be achieved by traditional
description. In summary, this is a valuable collection with respect to
corpora related studies, especially spoken corpora.


Leech, G. 1992. Corpora and theories of linguistic performance. In J.
Svartvik (ed.) _Directions in corpus linguistics. Proceedings of Nobel
Symposium 82, Stockholm_, 4-8 August 1991. Berlin: Mouton de Gruyter.

Tognini-Bonelli, E. 2001. _Corpus linguistics at work_. Amsterdam: John

Yuancheng Tu is currently a Ph.D student at the department of
linguistics at the University of Illinois at Urbana-Champaign. Her
research area is computational lexical semantics and corpus
linguistics. She is now working on her Ph.D thesis, which is building
a semantic network called PhraseNet from large corpora. Functions are
written for PhraseNet to interact with WordNet to expand it to
generate semantic features for other Natural Language Processing
applications, such as Question-Answering and Prepositional Phrase
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue