LINGUIST List 14.1098

Mon Apr 14 2003

Review: CompLing/Lang Acquisition: Granger et al.(2002)

Editor for this issue: Naomi Ogasawara <>

The LINGUIST List 'Just $5' Request

As of 11am, 04/14/03, we ONLY have $8100.43 to go!

Target: $50,000 Total Raised: $41899.57 Number of Donors: 1020 Percentage of Subscribers Donated: 6%

If every one of our 17,000+ subscribers donated JUST $5, we would raise over $85,000 - we are only asking for $50,000!

Please keep LINGUIST List free and support the student editors with a donation; Just $5 will make a world of difference.

DONATE - Don't Hesitate

Instructions on How To Donate

What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at


  • Viatcheslav Iatsko, Computer Learner Corpora

    Message 1: Computer Learner Corpora

    Date: Mon, 14 Apr 2003 09:55:45 +0000
    From: Viatcheslav Iatsko <>
    Subject: Computer Learner Corpora

    Granger, Sylviane, Joseph Hung and Stephanie Petch-Tyson ed. (2002) Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, John Benjamins Publishing Company, Language Learning and Language Teaching 6.

    Announced at

    Viatcheslav Iatsko, Department of English, Katanov State University of Khakasia

    The book under review is a collection of articles which focus on interrelationships between computer learner corpora (CLC), second language acquisition (SLA) and foreign language teaching (FLT). The contributors are qualified experts in CLC from different countries. Each contribution is followed by an extensive ''References'' section; the book is supplied by useful name and subject indexes. Since emphasis is made on theoretical as well as practical aspects of computer learner corpora analysis, this book may be of interest to researchers, teachers and practitioners engaged in CLC, SLA and FLT studies. The volume is divided into three sections.

    The first section entitled ''The role of computer learner corpora in SLA research and FLT'' is an introductory chapter written by Sylviane Granger (Belgium), which provides a general overview of learner corpus research and situates learner corpora within SLA studies and FLT. This chapter can be divided into two parts. The first one deals with different characteristics, typology, methodology of learner corpora linguistic analysis (contrastive and error analyses) and software tools applied in the process of such analysis (text retrieval programs, part-of-speech-tagging, error tagging). This part contains valuable observations about techniques of CLC analysis obtained from the author's personal experience. The second part is concentrated on pedagogical aspects of CLC research, curriculum and materials design.

    I can't help mentioning a disputable and perhaps contradictory statement formulated by Granger. While describing the field of corpus linguistics the author on the one hand states: ''It is neither a new branch of linguistics nor a new theory of language...'' (p.4), on the other hand Granger agrees with the experts who characterize corpus linguistics as ''new research enterprise'' (p.4). This statement seems strange since during at least the last decade corpus linguistics has been considered a linguistic discipline by the majority of representative of linguistic community. As Granger correctly writes on the same page, corpus linguistics has own its own methodology primarily aimed at quantitative analysis of corpora, at describing frequency features of linguistic phenomena. The author should have added that corpus linguistics has its own theory, foundations of which constitute Bradford's law of scattering (Bradford, 1953) and Zipf's law (Zipf, 1935). Finally, the existence of corpus linguistics as a linguistic subfield is confirmed by numerous books and conferences regularly announced on the Linguist List.

    The second section ''Corpus-based approaches to interlanguage'' illustrates a range of corpus based approaches to interlanguage analysis. It comprises three chapters written by Bengt Altenberg (Sweden), Karin Aijmer (Sweden), and Alex Housen (Belgium). In the opening chapter ''Using bilingual corpus evidence in learner corpus research'' B. Altenberg carries out comparisons of original- version and translated Swedish to test the hypothesis that overuse of causative ''make'' with adjective complements by Swedish L2 writers is due to L1 transfer. Using an aligned Swedish-English corpus the author finds that the overuse is due to an overgeneralization of the cross-linguistic similarity between ''make'' and its Swedish counterpart. Altenberg's research is based on sound methodology that comprises thorough contrastive analyses of a given language feature in a bilingual corpus and checking the results against a learner corpus to see whether the learners' output shows evidence of transfer from their L1.

    In the second chapter ''Modality in advanced Swedish learners' written interlanguage'' Aijmer uses computer learner corpora to compare the range and frequency of some modal words in native English writing and English L2 writing of advanced level university students. Although the primary focus of her investigation is Swedish L2 writers, she regularly conducts comparisons with French and German L2 writers in an attempt to ascertain whether features of Swedish L2 writing are likely to be L1-induced or more generally shared by L2 writers of different language backgrounds. This investigation compares modal forms (modal verbs and adverbs) in compositions produced by non-native and native speakers to reveal a considerable overuse of these forms, a tendency, which may be partly developmental, partly interlingal.

    In the third chapter Housen presents the results of a cross-sectional, corpus- based study into the acquisition of the basic forms and functions of the English verb system. Using rather sophisticated techniques of annotated oral CLC data processing the author managed to single out developmental patterns for acquisition of verbal morphology by L2 learners grouped into four different levels of proficiency. Apart from that, Housen investigated patterns of use of various verb form categories to find out that learners fluctuate between overuse and underuse as they fine-tune form-meaning associations. It also turned out that there may be significant individual variation in the route of development, even between learners of the same proficiency level and L1 background. Though Housen's study is based on the output of Dutch and French L2 learners, the results of the investigation are sure to be of interest for researchers and practitioners who work with L2 learners of different language backgrounds. These results may be especially important for those who work with L2 learners whose L1 doesn't have such a variety of verb forms as English. For example acquisition of English verb tense forms presents lots of difficulties for Russian speaking students since Russian has only three basic tense forms, progressive and perfective meanings being expressed either lexically of by verb affixes.

    The third section of the book ''Corpus-based approaches to foreign language pedagogy'' comprises 5 chapters written by Fanny Meunier (Belgium); Angela Hasselgren (Norway); Ulla Connor, Kristen Precht, Thomas Upton (USA); Quentin Grant Allan (China); Barbara Seidlehofer (Austria). Meuner's contribution ''The pedagogical value of native and learner corpora in EFL grammar teaching'' is divided into two parts. In part one the author examines the field of EFL grammar teaching from an SLA perspective, considering current thinking and current practice within SLA community. Meuner points out that native corpus research has contributed to a more adequate description of English grammar: frequency of the same grammatical features' occurrence varies in different text types, that why English grammar is no longer seen as a monolithic entity but rather as been comprised of several specific grammars pertaining to different discourse types. Meuner provides convincing evidence that the development of native and learner corpus research caused profound changes in curriculum design, reference tools, and classroom EFL grammar teaching. For example a frequency list of English irregular verb forms obtained from native corpora enabled teachers to sequence the study of these verbs in order of frequency instead of presenting them in alphabetical order; learner corpus research makes it possible to identify forms problematic for L2 learners and take into account learners' mother tongue; modern dictionaries provide frequency and register information; native corpora are a rich source of authentic examples included in modern textbooks.

    In the second chapter ''Learner corpora and language testing: small words as markers of learner fluency'' Hasselgren analyzes spoken data obtained from 14-15 year old Norwegian L2 learners to demonstrate how the use of small words, such as ''well'', can distinguish more fluent speech from less fluent speech. Automatically retrieving a core group of these words and phrases from the speech of groups differentiated by mechanical fluency markers, the author provides evidence that greater fluency is accompanied by greater quantity and variety of small words. Hasslegren also proposes a possible sequence for the acquisition of small words and a set of fluency descriptors.

    Though Hasselgren's research is innovative in nature, its main thesis seems doubtful and not well substantiated. Small words (such as ''well'', ''right'', ''you know'', not really'') are treated by the author as discourse markers, which make a crucial contribution to coherence: ''The ability to create coherence in Shiffrin's terms is compatible with the way fluency is identified in this article'' (p.149). In modern grammars (Downing & Locke, 2002; L. Brinton (2000); V.Iatsko (2001a), words and phrases indicated by Hasselgren are considered to be modal words/phrases, modal adverbs, modal parentheses expressing such notions as possibility, probability, volition, etc. For example ''well'' expresses hesitation (Downing & Locke, pp. 554-555), while ''really'' (in the negative context) expresses doubt (Downing &Locke, p.384). It's rather unlikely that words expressing doubt and hesitation contribute to speech fluency. The author should have provided a more profound analysis of small words' semantic features. In the third chapter ''Business English: learner data from Belgium, Finland and the US'' Connor, Precht, and Upton demonstrate the value of combining traditional textlinguistic tools of genre analysis, such as the identification of rhetorical moves, with a genre specific corpus to make broader statements about how different writers approach writing for a specific purpose. The learner corpus used in this study is an intercultural collection of letters of job applications from native and non-native speakers of English. The investigation revealed that while some rhetorical moves were used by all three groups, others were more group specific suggesting that different cultural norms might exist for the genre. Connor et al. highlight the sometimes unexpected impact that such differences may have for people attempting to apply for jobs across languages and cultures.

    Though the results of Connor et al.'s research are well substantiated some of its theoretical assumptions seem superficial. For example, the authors state that ''...the interweaving of discourse, syntax and lexicon have been overlooked by most previous research'' (p.176). The point is that such interweaving, correlation between different planes of discourse (semantic, communicative, modal, relational) is in focus of integrational discourse analysis conception, which I have been developing since 1996 (Iatsko, 2001b). According to another statement ''...a great deal of the corpus-based, more applied work has focused on the lexico-grammatical patterning of text, producing collocations and lists of fixed phrases; much of this work has centered on the propositional level of texts with less regard to functional and rhetorical aspects'' (p.177). It might be of interest to the authors that a corpus based methodology for analyzing rhetorical aspects of discourse has been developed in W.Mann's (1998) conception. Since both, Iatsko's and Mann's conceptions are available on the Internet, Connor et al. could have taken the trouble to find and study them. In the fourth chapter Allan describes Secondary Learner Corpus (TSLC), a resource which uses corpus data in systematic ways to raise the language awareness of secondary level English teachers in Hong Kong. TSLC, accessible via a computer network, is used in conjunction with a number of modern English corpora. Together, these corpora are an invaluable resource for answering teachers question about aspects of grammar and usage through Language Corners, and for systematic linguistic analysis of areas of English in which Hong Kong students experience difficulty.

    To the best of my knowledge, there is nothing like TSLC in my country and methods described by Allan can be adopted, fine-tuned to local conditions and fruitfully used in teacher training here, in Russia as well as in some other country.

    In the fifth chapter ''Pedagogy and local learner corpora: working with learning- driven data'' Seidhofer suggests a methodologically innovative corpus analytic approach, which she calls ''learner driven data'', enabling students to be both participants in and analysts of their own language. According to this approach computer tools are used for compiling and collaboratively analyzing a written learner corpus consisting of short complete texts (summaries and ''accounts'' produced by students. Seidhofer describes the success of the approach in motivating students to adopt corpus analysis techniques for research in linguistics, for work on language awareness.

    It should be noted that because summaries for the corpus were prepared manually Seidhofer missed a good opportunity to introduce her students to techniques of automatic text summarization, such as compiling a dictionary of speciality terms, determining summary size, editing summary (Iatsko 2001c). An advantage of the publications in this book is a new type of contrastive analysis, contrastive interlanguage analysis (Granger, 1998) which is aimed at providing data from L1 (learners' mother tongue), L2 (English), and interlanguage. To re-enforce interpretative power of this analysis the authors use output of different groups of L2 learners thus getting more reliable results. For example Altenberg compares output of French and Swedish L2 learners; Aijmer uses output of Swedish, French, and German L2 writers. This book is a significant contribution to learner corpus research, the new area of linguistic inquiry that emerged as an important link between two previously disparate fields of corpus linguistics and foreign/second language research.


    Bradford, Samuel C. (1953) Documentation. London: Crosby & Lockwood

    Brinton, L. (2000) The structure of modern English. Amsterdam; Philadelphia: John Benjamins.

    Downing A., Locke, Ph. (2002) A university course in English grammar. London; New York: Routledge.

    Granger, S. (1998) The computer learner corpus: a versatile new source of data for SLA research. In: S.Granger, ed. Learner English on Computer. London; New York: Longman.

    Iatsko, V. (2001a) English syntax for Russian speaking students. Abakan: Katanov State University of Khakasia Press

    Iatsko V. (2001b). Integrational discourse analysis. Abakan: Katanov State University of Khakasia

    Iatsko, V. (2001c) Linguistic aspects of summarization. In: Philologie im Netz. 2001. N 18. phin/phin18/p18i.htm

    Mann, W. (1998) Rhetorical structure theory.

    Zipf, G.K. (1935) Psycho-Biology of Languages. Houghton-Mifflin


    V. Iatsko is professor in the Department of English and Head of Computational Linguistics Laboratory at Katanov State University of Khakasia located in Abakan, Russia. His research interests include text summarization, text grammar, TEFL, contrastive analysis of English and Russian syntax.