Review of From the COLT’s mouth ... and others’. |
|
|
Review: |
Date: Tue, 10 Dec 2002 20:20:26 -0600 (CST) From: Yuancheng Tu Subject: Sociolinguistics: Review of Breivik and Hasselgren (eds) (2002)
Breivik, Leiv Egil and Angela Hasselgren (2002) From the Colt's Mouth ... And Others'. Rodopi, x+260pp, hardback ISBN 90-420-1479-2, $61.00, Language and Computers: Studies in Practical Linguistics 40.
Yuancheng Tu, Department of Linguistics, University of Illinois at Urbana-Champaign
'From the COLT's mouth...and others' is a collection of fifteen papers, each one exploring a different problem in language corpora studies. If the reader happens to know that the COLT (The Bergen Corpus of London Teenage Language) is an English corpus focusing on the speech of teenagers and Anna-Brita Stenström is the person who compiles it, the title of the book makes more sense and its coherence with the subtitle: Language Corpora Studies In honour of Anna-Brita Stenström can then be better perceived. The title of this book also reflects that the research conducted in this collection is more or less related to spoken corpora such as COLT.
'Does corpus linguistics exist? Some old and new issues' by Jan Aarts, is the first paper. This paper deals with a number of methodological questions in corpus linguistics. 'Old issues' here include types, nature and usage of corpus data. There are two 'new issues'. One is a heightened interest in the spoken language with the availability of new electronic resources such as COLT. The other is the distinction between corpus-based approach and corpus-driven approach (Tognini-Bonelli 2001). Jan argues that the prominent difference between corpus-based and corpus-driven is the attitude towards the annotation of corpus data. In corpus-based approach, annotation is indispensable while anathematic in the other. The paper concludes that corpus linguistics does exist and the difference between theoretical linguistics and corpus linguistics is the object of their study. The former is concerned about competence while the latter is about language-in-use, which is first pointed out by Leech (1992).
In 'Zero translations and cross-linguistic equivalence: Evidence from the English-Swedish Parallel Corpus', Karin Aijmer and Bengt Altenberg report that cross-linguistic non-equivalence is not the only reason of omission in translation. They use the English-Swedish Parallel Corpus to demonstrate that the occurrence of zero translation is governed by other factors, such as the clarity of the context, language-specific conventions and even cultural differences. Evidence lies in adverbial connectors in both English and Swedish, Swedish modal particles, English discourse particles and translations of endearment words from Swedish into English. Based on Jan Aarts's criteria, the research reported in this paper is typically corpus-based, using statistics from corpus as evidence to demonstrate a theoretical view.
Gisle Andersen's 'Corpora and the double copula' however is a typical corpus-driven paper. Data from Internet and British National Corpus exhibit a new sentence structure involving double copula such as _The best part is, is that you get to shoot your opponent_. Instead of explaining this double copula as an arbitrary hesitation feature, Andersen shows that it is actually a new grammatical feature: the tendency to repeat the copula before a nominal that-clause in the context of a focus construction. He argues that this double copula construction is a conflation of two focusing structures, the wh-cleft and clausal subject postponements of the type 'The point/issue/question is that'. Since it is not clear if the data from the Internet represent spoken or written, the double copula structure may be just a phenomenon in spoken language. However, the author provides evidence to support that this structure is spreading in several dimensions, from spoken to written, from American English to more general English, and from informal to more formal context.
'The non-nominal character of spoken English' by Pieter de Haan seeks evidence from British National Corpus sampler CD-ROM (one million words of spoken English and one million words of written English) to confirm the claim that the written variety of English has a strong nominal character whereas the spoken variety has a strong verbal, or clausal character. Therefore, it is typical corpus-based research. The paper also provides evidence to show the cline from informal spoken language to informative writing, which has the strongest nominal character.
The main concern of the next paper is exactly what its title says 'Teenage slang in Norway'. Eli-Marie Drange summarizes some of the results from a research project survey on Nordic Teenage Language. The survey shows a new trend that, apart from English, more and more words come from other languages such as Arabic and Spanish. And many of these words are in the process of being adjusted to Norwegian spelling and morphology.
'The semantics and pragmatics of the Norwegian concessive marker likevel: Evidence from the English-Norwegian Parallel Corpus' by Thorstein Fretheim and Stig Johansson reminds us of the second paper in this book "Zero translations and cross-linguistic equivalence: Evidence from the English-Swedish Parallel Corpus" by Karin Aijmer and Bengt Altenberg. Both of them use Parallel Corpus, examine language varieties and deal with translation strategies. Fretheim and Johansson claim that no single form in English parallels the concessive marker _likevel_ in Norwegian. This lack of formal counterpart in English triggers the occurrence of translation omission in going from Norwegian to English. In addition, evidence from English-Norwegian Parallel Corpus supports the idea that differences between Norwegian and English are most striking with _likevel_ in medial and final position where more inferential processing is required. But these two languages are more alike in regards to local concessive linking, signaled by initial _likevel_ and English concessive links of _even so_ type.
'Sound a bit foreign', By Angela Hasselgren, compares the use of small words, such as _well_, _all right_ and _sort of_ taken from more or less fluent Norweigian learners of English and native English speakers. The quality of small-word-usage is evaluated functionally via the ability to send the signals most essential to communication. It demonstrates that as the speakers' fluency increases, they are likely to use more small words and send more basic signals. However, the real difference exists between the ranges of small words used by more fluent learners and the native speakers. The limited range of the fluent learners deprives them from the pragmatic overtones that native speakers give to their signals and therefore makes them sound a little foreign.
'Congratulations, like: -Gratulerer, liksom! Proagmatic particles in English and Norwegian' by Ingrid Kristine Hasund presents the similarity of the pragmatic particles _like_ in English and _liksom_ in Norwegian. Hasund suggests that these two particles are used in similar ways to mark the speaker's epistemic stance towards the content or form of an utterance. The Bergen Corpus of London Teenage Language (COLT) is the corpus for the English part of the study and a corpus of spoken Oslo teenage language is used for the Norwegian part of the study.
'Applicatons of the Stenström model of discourse structure' by John M. Kirk simply applies Stenströmian model to a variety of transcribed spoken datasets and focuses on question and response exchanges by numbering them in each excerpt. Excerpts Kirk uses in this paper are from London-Lund Corpus, Map Task Corpus, and Dynasty, an American television soap opera. All of them support the idea that different types of conversational data or written dramatic dialogues can be identified and categorized by the Stenströmian model.
In 'The Britain: An unexpected case of article usage in present-day English', Goran Kjellmer investigate the variation with regard to article usage among names of counties such as _the UK_, which influences the use of the article with Britain. According to Quirk (1985), names of countries have no article, even with a premodifying adjective. However, one advertisement for the British Council on the Internet uses the article _the_ before Britain. Via searching BNC corpora, Kjellmer found that 'the Britain' actually occurs repeatedly. The reason for this is summarized as an analogy to the usage such as _the UK_.
'What vocabulary tells us about genre differences: A study of lexis in five newspaper genres' by Magnus Ljung is a corpus-based study on lexical differences. Five newspaper genres were selected: hard news, sports news, business news, arts articles and obituaries. The data were taken from the same five weekdays in the CDROM-based 1997 issues of The Times and The New York Times. The results of this research show that differences in word use do signal genre differences within certain textual parameters. Both newspapers have the tendency to be most formal with general news and least formal with sports.
'What is a grammatical rule?' by Dieter Mindt presents a new perspective of the definition of grammatical rules. Instead of description with exceptions, grammatical rules here resemble a mathematical function, i.e. the exponential function of decay. Evidence comes from the probability distribution derived from corpus statistics. Each grammatical rule is represented by a set of probability distribution of classes, and the class that is lower than 5% is traditionally called exceptions. This distributive representation of grammatical rules can predict the diachronic change of language, which cannot be achieved via the traditional definition of a grammatical rule.
David Minugh investigates the distribution of the formal adposition _notwithstanding_ in English in 'Her COLTISH energy notwithstanding: An examination of the adposition nothwithstanding'. This word is interesting since it can occur prepositionally or postpositionally. Via statistics from 1845 million words from present day English and newspaper CDs, he shows that written American English is most willing to use the postpositional form and the governed NP is also longer than that of prepositional form.
'As and other relativizers after same in present-day standard English' by Gunnel Tottie and Hans Martin Lehmann presents the use of _as_ as a relative marker in constructions where the antecedent contains the word _same_. Data from BNC-S and The Times show that same-constructions occur much more frequently with relativizers having adverbial function and predominantly bearing the manner type. Pragmatic explanation is provided to account for this phenomenon, and etymology is used to demonstrate why as is used as a relativizer after _same_.
Anne Wichmann in her 'looking for attitudes in corpora' looks into the ways people say things from ICE GB, the British contribution to the International Corpus of English. She chooses nine word tokens and two sentence structures as seeds to explore the corpusk, and her statistics reveal that people do not seem to talk about tone of voice very much though they intuitively recognize it and response to it. Anne also presents her categorization of various kinds of meanings that seem to be encoded in the attitudes of people saying things.
This book is in honor of Anna-Brita. All fifteen papers are directly or indirectly stamped by something she has done or written on spoken corpus and discourse analyses. The research conducted in every paper is more or less related to spoken data except the first one that is about methodology. However, even in that paper, Jan precisely points out that a new trend in corpus linguistics is the investigation of spoken data. This collection provides concrete evidence to show the contribution of corpus linguistics. Researchers observe new structures from large corpus, which are beyond linguists' intuition and introspection, such as the double copula structure reported by Gisle Andersen. The probabil distribution of a grammatical rule can signal diachronic change of language that will not be achieved by traditional description. In summary, this is a valuable collection with respect to corpora related studies, especially spoken corpora.
References: Leech, G. 1992. Corpora and theories of linguistic performance. In J. Svartvik (ed.) _Directions in corpus linguistics. Proceedings of Nobel Symposium 82, Stockholm_, 4-8 August 1991. Berlin: Mouton de Gruyter. 105-122.
Tognini-Bonelli, E. 2001. _Corpus linguistics at work_. Amsterdam: John Benjamins
|
|
ABOUT THE REVIEWER:
Yuancheng Tu is currently a Ph.D student at the department of linguistics at the University of Illinois at Urbana-Champaign. His research area is computational lexical semantics and corpus linguistics. He is now working on his Ph.D thesis, which is building a semantic network called PhraseNet from large corpora. Functions are written for PhraseNet to interact with WordNet to expand it to generate semantic features for other Natural Language Processing applications, such as Question-Answering and Prepositional Phrase Attachments. |
|
|