LINGUIST List 14.1098

Mon Apr 14 2003

Review: CompLing/Lang Acquisition: Granger et al.(2002)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


The LINGUIST List 'Just $5' Request As of 11am, 04/14/03, we ONLY have $8100.43 to go! Target: $50,000 Total Raised: $41899.57 Number of Donors: 1020 Percentage of Subscribers Donated: 6% If every one of our 17,000+ subscribers donated JUST $5, we would raise over $85,000 - we are only asking for $50,000! Please keep LINGUIST List free and support the student editors with a donation; Just $5 will make a world of difference. DONATE - Don't Hesitate http://linguistlist.org/donate.html Instructions on How To Donate http://linguistlist.org/donation.html What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at siminlinguistlist.org.

Directory

  1. Viatcheslav Iatsko, Computer Learner Corpora

Message 1: Computer Learner Corpora

Date: Mon, 14 Apr 2003 09:55:45 +0000
From: Viatcheslav Iatsko <slavaykhsu.ru>
Subject: Computer Learner Corpora

Granger, Sylviane, Joseph Hung and Stephanie Petch-Tyson ed. (2002)
Computer Learner Corpora, Second Language Acquisition and Foreign
Language Teaching, John Benjamins Publishing Company, Language
Learning and Language Teaching 6.

Announced at http://linguistlist.org/issues/14/14-146.html


Viatcheslav Iatsko, 
Department of English, Katanov State University of Khakasia

The book under review is a collection of articles which focus on
interrelationships between computer learner corpora (CLC), second
language acquisition (SLA) and foreign language teaching (FLT). The
contributors are qualified experts in CLC from different
countries. Each contribution is followed by an extensive
''References'' section; the book is supplied by useful name and
subject indexes. Since emphasis is made on theoretical as well as
practical aspects of computer learner corpora analysis, this book may
be of interest to researchers, teachers and practitioners engaged in
CLC, SLA and FLT studies. The volume is divided into three sections.

The first section entitled ''The role of computer learner corpora in
SLA research and FLT'' is an introductory chapter written by Sylviane
Granger (Belgium), which provides a general overview of learner corpus
research and situates learner corpora within SLA studies and FLT. This
chapter can be divided into two parts. The first one deals with
different characteristics, typology, methodology of learner corpora
linguistic analysis (contrastive and error analyses) and software
tools applied in the process of such analysis (text retrieval
programs, part-of-speech-tagging, error tagging). This part contains
valuable observations about techniques of CLC analysis obtained from
the author's personal experience. The second part is concentrated on
pedagogical aspects of CLC research, curriculum and materials design.

I can't help mentioning a disputable and perhaps contradictory
statement formulated by Granger. While describing the field of corpus
linguistics the author on the one hand states: ''It is neither a new
branch of linguistics nor a new theory of language...'' (p.4), on the
other hand Granger agrees with the experts who characterize corpus
linguistics as ''new research enterprise'' (p.4). This statement
seems strange since during at least the last decade corpus linguistics
has been considered a linguistic discipline by the majority of
representative of linguistic community. As Granger correctly writes
on the same page, corpus linguistics has own its own methodology
primarily aimed at quantitative analysis of corpora, at describing
frequency features of linguistic phenomena. The author should have
added that corpus linguistics has its own theory, foundations of which
constitute Bradford's law of scattering (Bradford, 1953) and Zipf's
law (Zipf, 1935). Finally, the existence of corpus linguistics as a
linguistic subfield is confirmed by numerous books and conferences
regularly announced on the Linguist List.

The second section ''Corpus-based approaches to interlanguage''
illustrates a range of corpus based approaches to interlanguage
analysis. It comprises three chapters written by Bengt Altenberg
(Sweden), Karin Aijmer (Sweden), and Alex Housen (Belgium). In the
opening chapter ''Using bilingual corpus evidence in learner corpus
research'' B. Altenberg carries out comparisons of original- version
and translated Swedish to test the hypothesis that overuse of
causative ''make'' with adjective complements by Swedish L2 writers is
due to L1 transfer. Using an aligned Swedish-English corpus the
author finds that the overuse is due to an overgeneralization of the
cross-linguistic similarity between ''make'' and its Swedish
counterpart. Altenberg's research is based on sound methodology that
comprises thorough contrastive analyses of a given language feature in
a bilingual corpus and checking the results against a learner corpus
to see whether the learners' output shows evidence of transfer from
their L1.

In the second chapter ''Modality in advanced Swedish learners' written
interlanguage'' Aijmer uses computer learner corpora to compare the
range and frequency of some modal words in native English writing and
English L2 writing of advanced level university students. Although the
primary focus of her investigation is Swedish L2 writers, she
regularly conducts comparisons with French and German L2 writers in an
attempt to ascertain whether features of Swedish L2 writing are likely
to be L1-induced or more generally shared by L2 writers of different
language backgrounds. This investigation compares modal forms (modal
verbs and adverbs) in compositions produced by non-native and native
speakers to reveal a considerable overuse of these forms, a tendency,
which may be partly developmental, partly interlingal.

In the third chapter Housen presents the results of a cross-sectional,
corpus- based study into the acquisition of the basic forms and
functions of the English verb system. Using rather sophisticated
techniques of annotated oral CLC data processing the author managed to
single out developmental patterns for acquisition of verbal morphology
by L2 learners grouped into four different levels of
proficiency. Apart from that, Housen investigated patterns of use of
various verb form categories to find out that learners fluctuate
between overuse and underuse as they fine-tune form-meaning
associations. It also turned out that there may be significant
individual variation in the route of development, even between
learners of the same proficiency level and L1 background. Though
Housen's study is based on the output of Dutch and French L2 learners,
the results of the investigation are sure to be of interest for
researchers and practitioners who work with L2 learners of different
language backgrounds. These results may be especially important for
those who work with L2 learners whose L1 doesn't have such a variety
of verb forms as English. For example acquisition of English verb
tense forms presents lots of difficulties for Russian speaking
students since Russian has only three basic tense forms, progressive
and perfective meanings being expressed either lexically of by verb
affixes.

The third section of the book ''Corpus-based approaches to foreign
language pedagogy'' comprises 5 chapters written by Fanny Meunier
(Belgium); Angela Hasselgren (Norway); Ulla Connor, Kristen Precht,
Thomas Upton (USA); Quentin Grant Allan (China); Barbara Seidlehofer
(Austria). Meuner's contribution ''The pedagogical value of native and
learner corpora in EFL grammar teaching'' is divided into two
parts. In part one the author examines the field of EFL grammar
teaching from an SLA perspective, considering current thinking and
current practice within SLA community. Meuner points out that native
corpus research has contributed to a more adequate description of
English grammar: frequency of the same grammatical features'
occurrence varies in different text types, that why English grammar is
no longer seen as a monolithic entity but rather as been comprised of
several specific grammars pertaining to different discourse types.
Meuner provides convincing evidence that the development of native and
learner corpus research caused profound changes in curriculum design,
reference tools, and classroom EFL grammar teaching. For example a
frequency list of English irregular verb forms obtained from native
corpora enabled teachers to sequence the study of these verbs in order
of frequency instead of presenting them in alphabetical order; learner
corpus research makes it possible to identify forms problematic for L2
learners and take into account learners' mother tongue; modern
dictionaries provide frequency and register information; native
corpora are a rich source of authentic examples included in modern
textbooks.

In the second chapter ''Learner corpora and language testing: small
words as markers of learner fluency'' Hasselgren analyzes spoken data
obtained from 14-15 year old Norwegian L2 learners to demonstrate how
the use of small words, such as ''well'', can distinguish more fluent
speech from less fluent speech. Automatically retrieving a core group
of these words and phrases from the speech of groups differentiated by
mechanical fluency markers, the author provides evidence that greater
fluency is accompanied by greater quantity and variety of small
words. Hasslegren also proposes a possible sequence for the
acquisition of small words and a set of fluency descriptors.

Though Hasselgren's research is innovative in nature, its main thesis
seems doubtful and not well substantiated. Small words (such as
''well'', ''right'', ''you know'', not really'') are treated by the
author as discourse markers, which make a crucial contribution to
coherence: ''The ability to create coherence in Shiffrin's terms is
compatible with the way fluency is identified in this article''
(p.149). In modern grammars (Downing & Locke, 2002; L. Brinton (2000);
V.Iatsko (2001a), words and phrases indicated by Hasselgren are
considered to be modal words/phrases, modal adverbs, modal parentheses
expressing such notions as possibility, probability, volition,
etc. For example ''well'' expresses hesitation (Downing & Locke,
pp. 554-555), while ''really'' (in the negative context) expresses
doubt (Downing &Locke, p.384). It's rather unlikely that words
expressing doubt and hesitation contribute to speech fluency. The
author should have provided a more profound analysis of small words'
semantic features. In the third chapter ''Business English: learner
data from Belgium, Finland and the US'' Connor, Precht, and Upton
demonstrate the value of combining traditional textlinguistic tools of
genre analysis, such as the identification of rhetorical moves, with a
genre specific corpus to make broader statements about how different
writers approach writing for a specific purpose. The learner corpus
used in this study is an intercultural collection of letters of job
applications from native and non-native speakers of English. The
investigation revealed that while some rhetorical moves were used by
all three groups, others were more group specific suggesting that
different cultural norms might exist for the genre. Connor et
al. highlight the sometimes unexpected impact that such differences
may have for people attempting to apply for jobs across languages and
cultures.

Though the results of Connor et al.'s research are well substantiated
some of its theoretical assumptions seem superficial. For example, the
authors state that ''...the interweaving of discourse, syntax and
lexicon have been overlooked by most previous research'' (p.176). The
point is that such interweaving, correlation between different planes
of discourse (semantic, communicative, modal, relational) is in focus
of integrational discourse analysis conception, which I have been
developing since 1996 (Iatsko, 2001b). According to another statement
''...a great deal of the corpus-based, more applied work has focused
on the lexico-grammatical patterning of text, producing collocations
and lists of fixed phrases; much of this work has centered on the
propositional level of texts with less regard to functional and
rhetorical aspects'' (p.177). It might be of interest to the authors
that a corpus based methodology for analyzing rhetorical aspects of
discourse has been developed in W.Mann's (1998) conception. Since
both, Iatsko's and Mann's conceptions are available on the Internet,
Connor et al. could have taken the trouble to find and study them. In
the fourth chapter Allan describes Secondary Learner Corpus (TSLC), a
resource which uses corpus data in systematic ways to raise the
language awareness of secondary level English teachers in Hong
Kong. TSLC, accessible via a computer network, is used in conjunction
with a number of modern English corpora. Together, these corpora are
an invaluable resource for answering teachers question about aspects
of grammar and usage through Language Corners, and for systematic
linguistic analysis of areas of English in which Hong Kong students
experience difficulty.

To the best of my knowledge, there is nothing like TSLC in my country
and methods described by Allan can be adopted, fine-tuned to local
conditions and fruitfully used in teacher training here, in Russia as
well as in some other country.

In the fifth chapter ''Pedagogy and local learner corpora: working
with learning- driven data'' Seidhofer suggests a methodologically
innovative corpus analytic approach, which she calls ''learner driven
data'', enabling students to be both participants in and analysts of
their own language. According to this approach computer tools are used
for compiling and collaboratively analyzing a written learner corpus
consisting of short complete texts (summaries and ''accounts''
produced by students. Seidhofer describes the success of the approach
in motivating students to adopt corpus analysis techniques for
research in linguistics, for work on language awareness.

It should be noted that because summaries for the corpus were prepared
manually Seidhofer missed a good opportunity to introduce her students
to techniques of automatic text summarization, such as compiling a
dictionary of speciality terms, determining summary size, editing
summary (Iatsko 2001c). An advantage of the publications in this book
is a new type of contrastive analysis, contrastive interlanguage
analysis (Granger, 1998) which is aimed at providing data from L1
(learners' mother tongue), L2 (English), and interlanguage. To
re-enforce interpretative power of this analysis the authors use
output of different groups of L2 learners thus getting more reliable
results. For example Altenberg compares output of French and Swedish
L2 learners; Aijmer uses output of Swedish, French, and German L2
writers. This book is a significant contribution to learner corpus
research, the new area of linguistic inquiry that emerged as an
important link between two previously disparate fields of corpus
linguistics and foreign/second language research.

REFERENCES 

Bradford, Samuel C. (1953) Documentation. London: Crosby & Lockwood

Brinton, L. (2000) The structure of modern English. Amsterdam;
Philadelphia: John Benjamins.

Downing A., Locke, Ph. (2002) A university course in English
grammar. London; New York: Routledge.

Granger, S. (1998) The computer learner corpus: a versatile new source
of data for SLA research. In: S.Granger, ed. Learner English on
Computer. London; New York: Longman.

Iatsko, V. (2001a) English syntax for Russian speaking
students. Abakan: Katanov State University of Khakasia Press

Iatsko V. (2001b). Integrational discourse analysis. Abakan: Katanov
State University of Khakasia http://www.khsu.ru/ida

Iatsko, V. (2001c) Linguistic aspects of summarization. In: Philologie
im Netz. 2001. N 18. www.fu-berlin.de/ phin/phin18/p18i.htm

Mann, W. (1998) Rhetorical structure theory. 
http://www.sil.org/linguistics/RST/index.htm 

Zipf, G.K. (1935) Psycho-Biology of Languages. Houghton-Mifflin 

ABOUT THE REVIEWER

V. Iatsko is professor in the Department of English and Head of
Computational Linguistics Laboratory at Katanov State University of
Khakasia located in Abakan, Russia. His research interests include
text summarization, text grammar, TEFL, contrastive analysis of
English and Russian syntax.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue