LINGUIST List 13.884

Sat Mar 30 2002

Review: Corpus/Text Ling: Ghadessy, Henry & Roseberry

Editor for this issue: Terence Langendoen <terrylinguistlist.org>


What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at siminlinguistlist.org or Terry Langendoen at terrylinguistlist.org.


Directory

  • Raphael Salkie, Small Corpus Studies and ELT: Theory and Practice

    Message 1: Small Corpus Studies and ELT: Theory and Practice

    Date: Sat, 30 Mar 2002 21:36:43 -0000
    From: Raphael Salkie <R.M.Salkiebton.ac.uk>
    Subject: Small Corpus Studies and ELT: Theory and Practice


    Ghadessy, Mohsen, Alex Henry, and Robert L. Roseberry, eds. (2001) Small Corpus Studies and ELT: Theory and Practice. John Benjamins Publishing Company, xxiii+419pp, hardback ISBN 1-58811-035-4 (US & Canada), USD 114.00, 90-272-2275-4 (ROW), EUR 125.00, Studies in Corpus Linguistics 5. Announced in http://linguistlist.org/issues/13/13-239.html#1

    Raphael Salkie, University of Brighton, England.

    Corpus enthusiasts have often had an interest in language teaching. Traditional grammars, dictionaries and textbooks tended to offer learners artificial, decontextualised examples of language. A corpus of real language enables teachers to use data that is both more interesting and more natural. General-purpose corpora have to be large, however, in order for significant patterns to emerge. This has restricted their use to people with powerful computers and strong programming expertise.

    This book offers a way out of that problem. The fourteen papers collected here aim to show that "the analysis of small textual corpora ... has yielded discoveries about language that are no less remarkable or important than those derived from the study of huge corpora" (from the editors' introduction). The book is aimed at teachers and students of language teaching, as well as fellow researchers.

    SUMMARY OF CONTENTS In his preface, John Sinclair denies having ever believed that only large corpora can yield interesting results. He observes that with a large corpus a common method is to have the computer do a lot of the preliminary analysis; while with a smaller corpus it is normal for 'human intervention' to come at an earlier stage. He notes that comparison of different text-types or genres is a common investigative technique with small corpora. This is certainly true: All except three of the contributors to this volume work with collections of texts that are specialised in some way.

    Two of the three exceptions are chapters which describe computational tools for working with small corpora. Paul Nation's 'Using small corpora to investigate learner needs: two vocabulary research tools' presents a pair of Windows programs: 'Vocabprofile', compares the vocabulary of a text against a list of the most frequent words in English, or against a list that the user prepares. 'RANGE' compares the vocabulary of up to 32 texts at a time. The programs can be used to measure the richness of the vocabulary of a text presented to learners, or of a text produced by learners. They can be downloaded for free from <ww.vuw.ac.nz/lals>.

    The title of Mike Scott's paper is self-explanatory: 'Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs'. Details of the program are available at: <www.liv.ac.uk/~ms2928/homepage.html>.

    The third paper which is not about language for specialised purposes is Ann Lawson's 'Collecting, aligning and analysing parallel corpora'. Lawson describes the different types of multilingual corpus and the software for analysing them, and makes some suggestions about how they can be used in the language classroom. The paper is the best survey of this area that I have seen.

    The other eleven papers have essentially the same structure. The first part is always a theoretical discussion of an issue. The issues are often about text types: The notion of 'genre' gets a good going-over from Alex Henry and Robert L. Roseberry in 'Using a small corpus to obtain data for teaching a genre', and from Marina Bondi in 'Small corpora and language variation: reflexivity across genres'. Vincent Ooi discusses the differences between printed and written texts in his 'Investigating and teaching genres using the World Wide Web'.

    Several of the contributors are keen on Halliday's systemic-functional grammar. Peter H. Ragan revisits the concepts of 'field', 'tenor', 'mode', 'register' and 'thematic structure' in 'Classroom use of a systemic functional small learner corpus', while 'Small corpora and translation: comparing thematic organisation in two languages' by Mohsen Ghadessy and Yanjie Gao provides even more variations on the notion of theme. Geoff Barnbrook and John Sinclair begin their paper 'Specialised corpus, local and functional grammars' with a quote from Halliday and later describe their analytical framework as 'more of a functional grammar than [those] of Dik or Halliday' (249).

    Some of the authors discuss the theory of language teaching. Chris Tribble surveys theories about teaching L2 writing in his 'Small corpora and teaching writing', while Geoff Thompson expands on trends in English language teaching more generally in 'Corpus, comparison, culture: doing the same things differently in different cultures'. In a similar vein, Lynne Flowerdew discusses the theory and practice of learner corpora (collections of texts produced by language learners) in 'The exploitation of small learner corpora in EAP materials', and Peter H. Ragan journeys through similar territory.

    For two of the papers, the main theoretical issue is the big one: What is language? Robert de Beaugrande's paper 'Large corpora, small corpora and the learning of "language"' reflects on this matter, as do Barnbrook and Sinclair. The only paper of the eleven which almost resists the urge to theorise is John Flowerdew's "Concordancing as a tool in course design", although even here the first part of the paper spends some time comparing vocabulary size in a specialised corpus, a traditional dictionary, and the brains of native speakers.

    The second part of each paper is a description of a specialised corpus. In the same order as hitherto, these are:

    - Job applications, and introductions by academics of guest lecturers (Henry & Roseberry).

    - Abstracts of economics articles and introductory chapters of economics textbooks (Bondi).

    - 'Personal advertisements' (known in the UK as 'lonely hearts') on the web by people from Singapore and the USA (Ooi).

    - Instructions by 50 learners of English about a task involving coloured wooden blocks (Ragan).

    - Political commentaries in English and Chinese (Ghadessy & Gao).

    - Definitions from the COBUILD Students' Dictionary (Barnbrook and Sinclair).

    - Adverts on the web for MA courses in Applied Linguistics (Tribble).

    - Tourist brochures in English and Chinese, and job adverts in English and German (Thompson).

    - Project reports written by University students who are non-native speakers (Lynne Flowerdew).

    - Novels by Jane Austen, and texts about the geography of deserts (de Beaugrande).

    - Texts about biology (John Flowerdew)

    The third and final part of these eleven papers contains suggestions about how the corpus or the information derived from it can be used in language teaching and learning. Sometimes the suggestions are very limited: Barnbrook & Sinclair, for instance, do little more than hint that 'new kinds of reference books' for learners of English could be developed on the basis of their analytical system. They also note that their semi- automatic analysis of the definitions in the COBUILD Students' Dictionary made possible a thorough review of the way the dictionary was constructed, and that such a review was used in a project to translate this dictionary into other languages. Similarly, the brief suggestion by Ghadessy and Gao about how to use their analysis of thematic structure in English and Chinese texts essentially says 'present this material to students', and Thompson's proposals about how to show students his comparison of tourist brochures and job adverts in different languages is heavily modalised ('the primary focus could be ... it would seem sensible to encourage the learners to ...' etc).

    Some of the contributors describe how they used their corpus material with a specific group of students. Ragan compared the instructions written by non-native speakers with those produced by native speakers for word frequency and various Hallidayan features; he then used this material with the non-native speakers to help them improve their English. John Flowerdew used his biology corpus as part of an activity in which his students studied how to use verbs such as ENCLOSE, SUSPEND, and SEPARATE which occurred frequently in the texts.

    A more thoughtful discussion of how the materials were organised for students is given by Henry & Roseberry. Aware that simply giving students a concordance with numerous examples of a word or phrase can be intimidating or uninteresting, they translated their analysis of job applications into a hierarchical structure which students accessed gradually to help them write their own sample applications. Bondi reproduces a four-page worksheet that she used with her students to help them study the language of abstracts of economics articles: It includes concordances of verbs like SHOW and ARGUE, and asks students to say, for example, whether the subjects of these verbs are discourse participants (WE, THE AUTHORS) or discourse units (THIS PAPER, THE NEXT SECTION).

    EVALUATION The contributors to this book are clearly corpus devotees who get a buzz from compiling and analysing electronic texts. There is nothing wrong with this: Lively presentations of research and teaching experiences are surely to be welcomed. The book also demonstrates repeatedly that a small collection of texts can reveal significant linguistic patterns, which is encouraging for people without the time, expertise or computing resources to handle large corpora.

    Two problems make me doubt whether language teachers will be convinced. Firstly, several of the papers compare linguistic features in their specialised corpus with the same features in a large reference corpus such as the British National Corpus or the Bank of English. As a research method this is impeccable, but of course it requires access to a large corpus, which is precisely what many language teachers do not have.

    Secondly, none of the contributors attempts to show that their teaching method actually works. This is a serious weakness: Someone should take two similar groups of students, teach one of them using corpus-derived material and the other group in some other way, and measure the outcomes. Until this type of study has been conducted, using corpora in the classroom is an act of faith. Such a study would have to factor in the many hours that teachers need to spend compiling and analysing the corpus and preparing teaching materials, when they could be devoting the time to other useful activities such as resting, or resisting globalisation. The study would also have to take account of teacher and student enthusiasm, which is precious but hard to measure. To use the current medical buzzword, this is 'evidence-based teaching' - but only in the sense that it uses authentic evidence in the classroom, not because it is based on evidence about the effectiveness of the teaching method.

    Will this volume persuade more linguists to use computer corpora in their research? I think that the title doesn't help in this respect: An alternative would have been 'Specialised Corpora: Design, Research, Teaching Methods', which would have been more accurate and might have attracted linguists who are less interested in L2 pedagogy.

    Furthermore, the research presented in the book is entirely about specialised texts, which is likely to interest people working on language variation rather than grammarians. On the other hand, the relationship between descriptions of sublanguages and descriptions of the general language system is an issue which many linguists can usefully think about. Barnbrook & Sinclair borrow the term 'local grammar' from Gross (1993) for an analysis of a restricted domain of texts, and it is clear that most of the contributors to this book are constructing partial local grammars of this kind. Barnbrook & Sinclair speculate that a battery of local grammars will be able in a few years time to 'analyse satisfactorily the bulk of open text', with 'general grammar' having only a residual role. An initial reaction might be that it is hard to see how young people could acquire a set of local grammars plus a general grammar, but there is certainly food for thought here.

    BIBLIOGRAPHY Gross, Maurice. 1993. 'Local grammars and their representation by finite automata'. In Michael Hoey (ed.), Data, Description, Discourse: Papers on the English Language in Honour of John McH. Sinclair (London, Harper Collins), 26-38.

    ABOUT THE REVIEWER Raphael Salkie is Principal Lecturer in Language Studies at the University of Brighton. His interests include modality, translation and contrastive linguistics, and he is the author of several papers about parallel corpora.