Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


Style, Mediation, and Change

Edited by Janus Mortensen, Nikolas Coupland, and Jacob Thogersen

Style, Mediation, and Change "Offers a coherent view of style as a unifying concept for the sociolinguistics of talking media."

New from Cambridge University Press!


Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."

Review of  Corpus Linguistics in Language Teaching

Reviewer: Vasiliki Papaioannou
Book Title: Corpus Linguistics in Language Teaching
Book Author: Tony Harris María Moreno Jaén
Publisher: Peter Lang AG
Linguistic Field(s): Applied Linguistics
Text/Corpus Linguistics
Language Acquisition
Issue Number: 22.3387

Discuss this Review
Help on Posting
EDITORS: Tony Harris & María Moreno Jaén
TITLE: Corpus Linguistics in Language Teaching
SERIES TITLE: Linguistic Insights, Volume 128
YEAR: 2010

Papaioannou Vasiliki, School of English, Aristotle University of Thessaloniki,

The book under review is a volume featuring a selection of papers presented at
the International seminar “New Trends in Corpus Linguistics for Language
Teaching and Translation Studies, in honor of John Sinclair'' held at the
University of Granada in September 2008.

The book is divided into two parts, each examining one area of recent research
made in the field of Corpus Linguistics with applications to language teaching:
''Applying corpora to Language Learning and Teaching'' (data-driven learning and
other direct applications of corpora to language learning, four papers), and
''Corpus-based Research for Language Teaching'' (methods and tools for corpus
research indirectly applicable to language learning, three papers). A section
containing contributors' bionotes (pp. 211-214) complements the volume.


Alex Boulton introduces chapter 1, ''Data-driven learning: On paper, in
practice'', with the remark that despite its merits argued in a large number of
research papers, Data-Driven Learning (DDL) is to date a rare practice in real
language learning classrooms. In this paper, a number of reasons that may
account for this reality are given, namely the lack of appropriateness,
relevance and efficiency on the one hand and the high degree of difficulty on
the other hand of current DDL tools and methods for the average language learner.

According to the author, the aforementioned shortcomings can be addressed with
the integration of a DDL approach into coursebooks and other printed material.
More specifically, he argues that printed handouts and textbooks that contain
DDL elements can be both appealing and effective for use in the average language
learning classroom, since they help teachers and learners overcome the
bottleneck of 'radical computational methods' by rendering DDL into common practice.

In the next section of Boulton's article, an extensive critical review of
existing DDL material both print and online is made, in an attempt to explain
the lack of commercially available DDL materials, despite the obvious plethora
of relevant research projects, most notably identifying the following: the
reluctance of well-known publishers to invest in DDL, the earlier focus of DDL
on the rather restricted market of higher education and the tendency to do new
research rather than applying existing methods to secondary education contexts.

The author then proceeds with the presentation of eight printed courses that are
certainly corpus-informed, but many of which can only marginally be named pure
DDL, as well as a brief mention of various online DDL-based language learning
resources. The critical assessment of these resources leads the author to the
conclusion that DDL can be made more widely available by integrating DDL
activities into coursebooks and other existing material both print and online
and sharing resources via the internet, even though this may entail adopting a
much softer researcher/teacher mediated version of DDL that will make the
innovative DDL methodology and the traditional teaching/ learning practices
'meet half-way'.

In chapter 2, ''Corpus Linguistics and Language Education in Perspective:
Appropriation and the Possibilities Scenario'', Pascual Pérez-Paredes begins by
briefly reviewing research work on Corpus Linguistics (CL) applications to
language teaching and questions a general assumption that ''transfer of research
methods to language education should be done in a straightforward way'' (p. 54),
more specifically the appropriateness of using methods, corpora and software
created by researchers and initially intended for research purposes only.
Instead he claims that a more pedagogically oriented approach should be chosen,
for use at the non tertiary language classroom setting, to cater for the
existing gap between research and application.

His choice of the particular language learning setting is supported by surveys
showing that the vast majority of language learning takes place outside tertiary
education. His claim is reinforced by various citations of researchers
supporting the need for purpose-built corpora and tools with language learning
in mind.

He proposes a methodology, which he calls a 'feasibility scenario', to exploit
and enrich methods developed by CL research for use in language learning
classroom, in three ways:

By producing a tagset that can be manipulated/customized by the average teacher
having in mind his/her learner's needs;

By creating corpora and corpus exploitation tools for language learning settings;

By integrating corpora into language learning rather than adapting them.

The above methodology underlines the rationale behind the SACODEYL (System Aided
Compilation and Open Distribution of European Youth Language) initiative
co-developed by the author. SACODEYL is a multipurpose, open-for-teacher
customization tool, for compiling, annotating, comparing and producing
concordances from speech corpora in seven European languages. The rest of the
chapter describes this tool's design and application.

In chapter 3, ''Getting Past 'Groundhog Day': Spoken Multimedia Corpora for
Student-centred Corpus Exploitation'', Sabine Braun expresses equal concerns
regarding two of the points made by Pérez-Paredes in Chapter 2; firstly, that
despite its growing appeal to the teaching community CL methods have limited
presence in the language learning classroom and secondly, that a conventional
DDL methodology may be not entirely appropriate to teach a foreign language ''at
least to some learner populations'' (p. 22).

Braun's article begins with a brief literature review that mirrors the CL
research impact on language teaching, immediately followed by the statement that
actual teaching practices involving use of corpora are sparse -- and she
presents the reasons she sees behind this reality. Moreover she draws attention
to the DDL methodology as it was proposed by Tim Johns and expresses her concern
that a pure DDL approach which involves looking through KWIC (Key Word In
Context) concordances and word lists, may, despite their stated positive
results, be too difficult a task for the average language learner. She
especially stresses the difficulty in compiling and exploiting spoken corpora
for language learning purposes, however she claims that ''recent developments …in
multimedia formats demonstrate that there are other plausible ways of mediating
the content of a corpus to a learner'' (p. 81).

She proceeds to present the ELISA (English Language Interview Corpus as a Second
language Application) project, which exploits spoken corpora for pedagogical
applications. More specifically, it is a spoken language corpus including both
transcripts and instant access to video clips, as well a concordancer.
Furthermore, it is pedagogically annotated, as it includes not only POS
(part-of-speech) tagging but also topic and communicative functions which allow
instances of language to be retrieved that are more familiar to learners. ELISA
receives an extensive description of 12 pages where its design and uses are
discussed together with some useful examples of use that provide better
understanding of the project.

In chapter 4, ''Contrastive Language Data: From Translation Studies to Language
Learning and Teaching'', Angela Chambers discusses some possible applications of
parallel corpora for the learning of languages. After a brief review of the
research conducted so far on the use of monolingual and bilingual corpora both
in translator education and in language learning fields, she claims that the
recent dominance of the communicative approach in language learning and the
rejection of the grammar-translation method has inevitably led to equal
rejection of the case of parallel corpora in language learning even though there
are voices stating the potential benefits of using bilingual concordances for
learning purposes.

More specifically, the author discusses the findings of an experiment conducted
in a real language learning classroom which show that the use of worksheets
containing concordances drawn from a parallel corpus can inform language
learners about differences in the use of certain words and expressions in their
native language as well as in target language, thus making the reliance on the
students' or their teachers' intuitions less prominent.

Stefan Th. Gries introduces chapter 5 ''Methodological Skills in Corpus
Linguistics: A Polemic and Some Pointers Towards Quantitative Methods'' with the
remark that even though CL is a discipline that relies heavily on material taken
from a corpus in the form of distributional frequency-based probabilistic data,
the majority of researchers are not familiar with statistical methods, so as to
obtain all the information a corpus could give. On the contrary, they are rather
happy to rely on tools that are limited in terms of availability, functionality
and user control.

The main part of the chapter covers the description of five case studies of
corpus research on various language phenomena provided by various authors in the
past. The first two case studies involve the examination of linguistic phenomena
and involve only one or two variables; the other three involve the investigation
of a more multifactorial dataset. The author assesses the data processing
methods included in all five studies and illustrates how such research could
yield more accurate results with the use of more elaborate statistical techniques.

In chapter 6, ''Comparing Parts of Speech and Semantic Domains in the BNC and a
Micro-corpus of Movies: Is Film Language the 'Real Thing'?'' María Elena
Rodríguez Martín discusses the suitability of using film language to teach
speaking skills in EFL classrooms, by examining the degree of similarity the
language used in films has against every-day conversational language. A case
study conducted by the author involves the creation of a corpus of film language
comprising the manuscripts of ten English films, and the comparison of certain
linguistic features against BNC's (British National Corpus) spoken component

The introductory paragraph briefly discusses two earlier studies upon which the
present case study is based. In the earliest one, the 50 most frequent items
derived from the two corpora were compared so as to reveal similarities and
differences. The following study used Wordsmith Tools 4.0 software to check both
key conversational features and collocations in the above mentioned corpora.
Both studies indicated that there are no significant differences in the
conversational features of the corpora.

The present research takes earlier studies a step further so as to highlight
similarities and differences of natural speech and semantic domains in the two
corpora. For POS comparison the screen language corpus was tagged using the
CLAWS (Constituent Likelihood Automatic Word-tagging System) tagger, and the
frequency of various POS tags in the two corpora were compared using a Log
likelihood measure so as to show overused and underused tags.

Chapter 7, ''Research into the Annotation of a Multimodal Corpus of University
Websites: An Illustration of Multimodal Corpus Linguistics'', by Anthony Baldry
and Kay L. O'Halloran discusses current practices in web search tools, with the
authors observing that general use searches are rather noun-oriented keyword
based and they do not rely on visual content or meaning. They also suggest that
if web searching was based on more complex annotation techniques that involved
templates highlighting higher level structures such as genres and patterns they
could locate features that cannot be detected with present keyword-based searches.

The main part of the chapter presents the author's research from 2002 till 2004
in constructing a manual, semi automatic annotation system for the description
of generic features in university websites. They also discuss previous research
that led to the current case study. The MWB (MCA (multimodal corpus linguistics)
Web Browser) is a web annotation and corpus construction tool which defines the
hierarchical relationships between text and subtexts on a web page in a
multimodal manner, with the integration of visual, linguistic and spatial
resources. The creation of such a tool reflects the emerging usage of the
Internet as a means for exchanging opinions and beliefs, and for social
networking, thus personalization.


The first four chapters present existing printed Data-Driven Learning material,
tools such as SACODEYL, ELISA and specific parallel corpora, all of them being
direct applications of Corpus Linguistics Methodology to the language teaching
classroom. The last three chapters deal with corpus-based research discussing a)
the usage of more refined statistical corpus analysis, b) comparison of
conversational language used in a micro-corpus of movies and in the BNC, and c)
annotation of a multimodal corpus of university websites, that could indirectly
be of use for language learning purposes.

Each paper treats a different approach to language learning-targeted corpus
research thoroughly, together with examples and references to previous work
conducted in the field. In some chapters a well known methodology or tool is
discussed (for example Tim Johns' (1991) kibitzers in chapters one and three)
while others give insights into a rather unexplored part of applied corpus
linguistics (for example the use of parallel corpora discussed in chapter four).

It is obvious that the intention of the editors is to provide a book that could
be handy for the language teacher who is willing to try new methods and
innovative approaches to language teaching using corpus-based tools. However,
the small number of papers included can be regarded as a shortcoming; a greater
selection of papers could provide any language teacher with more examples and
references to corpus-based/driven language learning and teaching.

Another shortcoming is the absence of an explicit link of the methodology
presented in Gries' article with the actual practice of language learning.
Throughout the whole chapter no reference is made to language teaching. What is
more, the average language teacher could be intimidated by the use of terms and
techniques that do not belong to their field of expertise.

The editing of the volume is generally careful, the very few slips mostly
concerning punctuation. One oversight can be traced in the references section of
Boulton's paper, where the reference to P. Thompson's (2006) article, included
in the Kantaridou et al. volume, wrongfully refers to the place of publication
as 'Macedonia' instead of 'Thessaloniki, Macedonia, Greece'.

Despite the minor above-mentioned shortcomings, the present volume makes a
strong contribution to the field of corpus linguistics application in language
teaching. It takes the notion of corpus-based/driven informed learning a much
needed step further, answering the demands of the language teaching and learning
community (Bernardini 2005, Roemer 2006, Sinclair 2004 among others) for
concrete practical suggestions on how we could use corpora in language learning
rather than why we should do so.

Answering Gabrielatos' question 'Corpora and Language Teaching: just a fling or
wedding bells?' (2005) the publication of this book, of other volumes (e.g. the
publications of papers presented in various TaLC conferences) that have preceded
it and of other books that we hope will follow it, show the coming of a much
expected matrimony.


Bernardini, Silvia. 2005. ''Corpora with language learners.'' Abstracts Corpus
Linguistics 2005. Centre for Corpus Research, University of Birmingham, UK

Johns, Tim F. 1991. ''Should you be persuaded – two samples of data-driven
learning materials.'' Tim F. Johns and Philip King, eds. Classroom Concordancing
(ELR Journal 4), 1-16.

Gabrielatos, Costas. 2005. ''Corpora and language teaching: just a fling or
wedding bells?''. EJ 8, 4.
[Accessed 2011-04-25].

Roemer, Ute. 2006. ''Pedagogical Applications of Corpora: Some Reflections on the
Current Scope and a Wish List for Future Developments''. ZAA 54.2 : 121-134.

Sinclair, John. 2004. ''How to Use Corpora in Language Teaching''. Amsterdam: John

Vasiliki Papaioannou (MSc in Machine Translation, UMIST UK 2000, MA in Language and Communication Sciences AUTh 2009) is a PhD candidate at the School of English Aristotle University of Thessaloniki, Greece, working on data-driven learning applications to language learning and teaching in secondary level education. She is also a State school EFL teacher and for the last four years she has taught ESP as a visiting tutor at the Democritus University of Thrace, Greece. Her research interests lie in theoretical and applied Corpus Linguistics (for Language Learning and Machine Translation), data-driven learning methodology with applications for language learning, computational tools for language learning (ESP and EFL) and Case-based Reasoning. She is also interested in web-based language learning environments and in autonomous learning and self assessment.

Format: Paperback
ISBN-13: 9783034305242
Pages: 214
Prices: U.S. $ 58.95
U.K. £ 34.20