EDITORS: Tony Harris & María Moreno Jaén TITLE: Corpus Linguistics in Language Teaching SERIES TITLE: Linguistic Insights, Volume 128 PUBLISHER: Peter Lang YEAR: 2010
Papaioannou Vasiliki, School of English, Aristotle University of Thessaloniki, Greece
The book under review is a volume featuring a selection of papers presented at the International seminar “New Trends in Corpus Linguistics for Language Teaching and Translation Studies, in honor of John Sinclair'' held at the University of Granada in September 2008.
The book is divided into two parts, each examining one area of recent research made in the field of Corpus Linguistics with applications to language teaching: ''Applying corpora to Language Learning and Teaching'' (data-driven learning and other direct applications of corpora to language learning, four papers), and ''Corpus-based Research for Language Teaching'' (methods and tools for corpus research indirectly applicable to language learning, three papers). A section containing contributors' bionotes (pp. 211-214) complements the volume.
Alex Boulton introduces chapter 1, ''Data-driven learning: On paper, in practice'', with the remark that despite its merits argued in a large number of research papers, Data-Driven Learning (DDL) is to date a rare practice in real language learning classrooms. In this paper, a number of reasons that may account for this reality are given, namely the lack of appropriateness, relevance and efficiency on the one hand and the high degree of difficulty on the other hand of current DDL tools and methods for the average language learner.
According to the author, the aforementioned shortcomings can be addressed with the integration of a DDL approach into coursebooks and other printed material. More specifically, he argues that printed handouts and textbooks that contain DDL elements can be both appealing and effective for use in the average language learning classroom, since they help teachers and learners overcome the bottleneck of 'radical computational methods' by rendering DDL into common practice.
In the next section of Boulton's article, an extensive critical review of existing DDL material both print and online is made, in an attempt to explain the lack of commercially available DDL materials, despite the obvious plethora of relevant research projects, most notably identifying the following: the reluctance of well-known publishers to invest in DDL, the earlier focus of DDL on the rather restricted market of higher education and the tendency to do new research rather than applying existing methods to secondary education contexts.
The author then proceeds with the presentation of eight printed courses that are certainly corpus-informed, but many of which can only marginally be named pure DDL, as well as a brief mention of various online DDL-based language learning resources. The critical assessment of these resources leads the author to the conclusion that DDL can be made more widely available by integrating DDL activities into coursebooks and other existing material both print and online and sharing resources via the internet, even though this may entail adopting a much softer researcher/teacher mediated version of DDL that will make the innovative DDL methodology and the traditional teaching/ learning practices 'meet half-way'.
In chapter 2, ''Corpus Linguistics and Language Education in Perspective: Appropriation and the Possibilities Scenario'', Pascual Pérez-Paredes begins by briefly reviewing research work on Corpus Linguistics (CL) applications to language teaching and questions a general assumption that ''transfer of research methods to language education should be done in a straightforward way'' (p. 54), more specifically the appropriateness of using methods, corpora and software created by researchers and initially intended for research purposes only. Instead he claims that a more pedagogically oriented approach should be chosen, for use at the non tertiary language classroom setting, to cater for the existing gap between research and application.
His choice of the particular language learning setting is supported by surveys showing that the vast majority of language learning takes place outside tertiary education. His claim is reinforced by various citations of researchers supporting the need for purpose-built corpora and tools with language learning in mind.
He proposes a methodology, which he calls a 'feasibility scenario', to exploit and enrich methods developed by CL research for use in language learning classroom, in three ways:
By producing a tagset that can be manipulated/customized by the average teacher having in mind his/her learner's needs;
By creating corpora and corpus exploitation tools for language learning settings;
By integrating corpora into language learning rather than adapting them.
The above methodology underlines the rationale behind the SACODEYL (System Aided Compilation and Open Distribution of European Youth Language) initiative co-developed by the author. SACODEYL is a multipurpose, open-for-teacher customization tool, for compiling, annotating, comparing and producing concordances from speech corpora in seven European languages. The rest of the chapter describes this tool's design and application.
In chapter 3, ''Getting Past 'Groundhog Day': Spoken Multimedia Corpora for Student-centred Corpus Exploitation'', Sabine Braun expresses equal concerns regarding two of the points made by Pérez-Paredes in Chapter 2; firstly, that despite its growing appeal to the teaching community CL methods have limited presence in the language learning classroom and secondly, that a conventional DDL methodology may be not entirely appropriate to teach a foreign language ''at least to some learner populations'' (p. 22).
Braun's article begins with a brief literature review that mirrors the CL research impact on language teaching, immediately followed by the statement that actual teaching practices involving use of corpora are sparse -- and she presents the reasons she sees behind this reality. Moreover she draws attention to the DDL methodology as it was proposed by Tim Johns and expresses her concern that a pure DDL approach which involves looking through KWIC (Key Word In Context) concordances and word lists, may, despite their stated positive results, be too difficult a task for the average language learner. She especially stresses the difficulty in compiling and exploiting spoken corpora for language learning purposes, however she claims that ''recent developments …in multimedia formats demonstrate that there are other plausible ways of mediating the content of a corpus to a learner'' (p. 81).
She proceeds to present the ELISA (English Language Interview Corpus as a Second language Application) project, which exploits spoken corpora for pedagogical applications. More specifically, it is a spoken language corpus including both transcripts and instant access to video clips, as well a concordancer. Furthermore, it is pedagogically annotated, as it includes not only POS (part-of-speech) tagging but also topic and communicative functions which allow instances of language to be retrieved that are more familiar to learners. ELISA receives an extensive description of 12 pages where its design and uses are discussed together with some useful examples of use that provide better understanding of the project.
In chapter 4, ''Contrastive Language Data: From Translation Studies to Language Learning and Teaching'', Angela Chambers discusses some possible applications of parallel corpora for the learning of languages. After a brief review of the research conducted so far on the use of monolingual and bilingual corpora both in translator education and in language learning fields, she claims that the recent dominance of the communicative approach in language learning and the rejection of the grammar-translation method has inevitably led to equal rejection of the case of parallel corpora in language learning even though there are voices stating the potential benefits of using bilingual concordances for learning purposes.
More specifically, the author discusses the findings of an experiment conducted in a real language learning classroom which show that the use of worksheets containing concordances drawn from a parallel corpus can inform language learners about differences in the use of certain words and expressions in their native language as well as in target language, thus making the reliance on the students' or their teachers' intuitions less prominent.
Stefan Th. Gries introduces chapter 5 ''Methodological Skills in Corpus Linguistics: A Polemic and Some Pointers Towards Quantitative Methods'' with the remark that even though CL is a discipline that relies heavily on material taken from a corpus in the form of distributional frequency-based probabilistic data, the majority of researchers are not familiar with statistical methods, so as to obtain all the information a corpus could give. On the contrary, they are rather happy to rely on tools that are limited in terms of availability, functionality and user control.
The main part of the chapter covers the description of five case studies of corpus research on various language phenomena provided by various authors in the past. The first two case studies involve the examination of linguistic phenomena and involve only one or two variables; the other three involve the investigation of a more multifactorial dataset. The author assesses the data processing methods included in all five studies and illustrates how such research could yield more accurate results with the use of more elaborate statistical techniques.
In chapter 6, ''Comparing Parts of Speech and Semantic Domains in the BNC and a Micro-corpus of Movies: Is Film Language the 'Real Thing'?'' María Elena Rodríguez Martín discusses the suitability of using film language to teach speaking skills in EFL classrooms, by examining the degree of similarity the language used in films has against every-day conversational language. A case study conducted by the author involves the creation of a corpus of film language comprising the manuscripts of ten English films, and the comparison of certain linguistic features against BNC's (British National Corpus) spoken component sub-corpus.
The introductory paragraph briefly discusses two earlier studies upon which the present case study is based. In the earliest one, the 50 most frequent items derived from the two corpora were compared so as to reveal similarities and differences. The following study used Wordsmith Tools 4.0 software to check both key conversational features and collocations in the above mentioned corpora. Both studies indicated that there are no significant differences in the conversational features of the corpora.
The present research takes earlier studies a step further so as to highlight similarities and differences of natural speech and semantic domains in the two corpora. For POS comparison the screen language corpus was tagged using the CLAWS (Constituent Likelihood Automatic Word-tagging System) tagger, and the frequency of various POS tags in the two corpora were compared using a Log likelihood measure so as to show overused and underused tags.
Chapter 7, ''Research into the Annotation of a Multimodal Corpus of University Websites: An Illustration of Multimodal Corpus Linguistics'', by Anthony Baldry and Kay L. O'Halloran discusses current practices in web search tools, with the authors observing that general use searches are rather noun-oriented keyword based and they do not rely on visual content or meaning. They also suggest that if web searching was based on more complex annotation techniques that involved templates highlighting higher level structures such as genres and patterns they could locate features that cannot be detected with present keyword-based searches.
The main part of the chapter presents the author's research from 2002 till 2004 in constructing a manual, semi automatic annotation system for the description of generic features in university websites. They also discuss previous research that led to the current case study. The MWB (MCA (multimodal corpus linguistics) Web Browser) is a web annotation and corpus construction tool which defines the hierarchical relationships between text and subtexts on a web page in a multimodal manner, with the integration of visual, linguistic and spatial resources. The creation of such a tool reflects the emerging usage of the Internet as a means for exchanging opinions and beliefs, and for social networking, thus personalization.
The first four chapters present existing printed Data-Driven Learning material, tools such as SACODEYL, ELISA and specific parallel corpora, all of them being direct applications of Corpus Linguistics Methodology to the language teaching classroom. The last three chapters deal with corpus-based research discussing a) the usage of more refined statistical corpus analysis, b) comparison of conversational language used in a micro-corpus of movies and in the BNC, and c) annotation of a multimodal corpus of university websites, that could indirectly be of use for language learning purposes.
Each paper treats a different approach to language learning-targeted corpus research thoroughly, together with examples and references to previous work conducted in the field. In some chapters a well known methodology or tool is discussed (for example Tim Johns' (1991) kibitzers in chapters one and three) while others give insights into a rather unexplored part of applied corpus linguistics (for example the use of parallel corpora discussed in chapter four).
It is obvious that the intention of the editors is to provide a book that could be handy for the language teacher who is willing to try new methods and innovative approaches to language teaching using corpus-based tools. However, the small number of papers included can be regarded as a shortcoming; a greater selection of papers could provide any language teacher with more examples and references to corpus-based/driven language learning and teaching.
Another shortcoming is the absence of an explicit link of the methodology presented in Gries' article with the actual practice of language learning. Throughout the whole chapter no reference is made to language teaching. What is more, the average language teacher could be intimidated by the use of terms and techniques that do not belong to their field of expertise.
The editing of the volume is generally careful, the very few slips mostly concerning punctuation. One oversight can be traced in the references section of Boulton's paper, where the reference to P. Thompson's (2006) article, included in the Kantaridou et al. volume, wrongfully refers to the place of publication as 'Macedonia' instead of 'Thessaloniki, Macedonia, Greece'.
Despite the minor above-mentioned shortcomings, the present volume makes a strong contribution to the field of corpus linguistics application in language teaching. It takes the notion of corpus-based/driven informed learning a much needed step further, answering the demands of the language teaching and learning community (Bernardini 2005, Roemer 2006, Sinclair 2004 among others) for concrete practical suggestions on how we could use corpora in language learning rather than why we should do so.
Answering Gabrielatos' question 'Corpora and Language Teaching: just a fling or wedding bells?' (2005) the publication of this book, of other volumes (e.g. the publications of papers presented in various TaLC conferences) that have preceded it and of other books that we hope will follow it, show the coming of a much expected matrimony.
Bernardini, Silvia. 2005. ''Corpora with language learners.'' Abstracts Corpus Linguistics 2005. Centre for Corpus Research, University of Birmingham, UK
Johns, Tim F. 1991. ''Should you be persuaded – two samples of data-driven learning materials.'' Tim F. Johns and Philip King, eds. Classroom Concordancing (ELR Journal 4), 1-16.
Gabrielatos, Costas. 2005. ''Corpora and language teaching: just a fling or wedding bells?''. EJ 8, 4. http://www-writing.berkeley.edu/TESL-EJ/ej32/a1.html. [Accessed 2011-04-25].
Roemer, Ute. 2006. ''Pedagogical Applications of Corpora: Some Reflections on the Current Scope and a Wish List for Future Developments''. ZAA 54.2 : 121-134.
Sinclair, John. 2004. ''How to Use Corpora in Language Teaching''. Amsterdam: John Benjamins.
ABOUT THE REVIEWER
ABOUT THE REVIEWER:
Vasiliki Papaioannou (MSc in Machine Translation, UMIST UK 2000, MA in
Language and Communication Sciences AUTh 2009) is a PhD candidate at the
School of English Aristotle University of Thessaloniki, Greece, working on
data-driven learning applications to language learning and teaching in
secondary level education. She is also a State school EFL teacher and for
the last four years she has taught ESP as a visiting tutor at the
Democritus University of Thrace, Greece. Her research interests lie in
theoretical and applied Corpus Linguistics (for Language Learning and
Machine Translation), data-driven learning methodology with applications
for language learning, computational tools for language learning (ESP and
EFL) and Case-based Reasoning. She is also interested in web-based language
learning environments and in autonomous learning and self assessment.