* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *

LINGUIST List 24.2625

Thu Jun 27 2013

Review: Applied Linguistics; Computational Linguistics: Frankenberg-Garcia et al. (2012)

Editor for this issue: Joseph Salmons <jsalmonslinguistlist.org>

Date: 24-Mar-2013
From: Robert Poole <repooleemail.arizona.edu>
Subject: New Trends in Corpora and Language Learning
E-mail this message to a friend

Discuss this message

Book announced at http://linguistlist.org/issues/23/23-5028.html

EDITOR: Ana Frankenberg-Garcia
EDITOR: Lynne Flowerdew
EDITOR: Guy Aston
TITLE: New Trends in Corpora and Language Learning
SERIES TITLE: Corpus and Discourse
PUBLISHER: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
YEAR: 2012

REVIEWER: Robert E Poole, University of Arizona

“New Trends in Corpora and Language Learning” provides a comprehensive look at
recent developments in corpus approaches for teaching and learning language.
The 15 chapters were developed from presentations delivered at the 2008
Teaching and Language Corpora Conference (TaLC). Part 1 features chapters
detailing current approaches to using corpora and corpus tools by language
learners from contexts around the world. These chapters explain approaches
that place data in the hands of the learner and report the benefits of such
instruction and learners’ responses to the methods and tools. The next section
discusses tools from multimodal concordancing software to a collocation
feedback program that learners can employ and exploit for their language
learning in addition to chapters detailing recent developments in machine
translation and parallel corpora. Finally, section three includes chapters
discussing insights made possible through analyses of learner corpora and the
pedagogical implications of the findings.

PART I: Corpora with language learners: use
Opening the text, Yukio Tono’s chapter “TaLC in action: recent innovations in
corpus-based English language instruction in Japan” details several novel
corpus-based applications while reporting the popularity of corpora in Japan
and the potential for its success elsewhere. Of particular interest was the
description of a corpus-based TV English program that more than 1 million
people watch per year. It spawned a popular children’s character, “Mr.
Corpus”, and won an award for best TV program from Japan's public broadcasting
center. The show enjoyed such great popularity that similarly themed iPhone
applications have been produced and a corpus-based Wii game application is
being developed. These success stories, Tono asserts, display the potential
for and viability of corpus-based approaches in Japan and elsewhere.

The second chapter, “Using hands-on concordancing to teach rhetorical
functions: evaluation and implications for EAP writing classes” by Maggie
Charles, presents a discourse-analytic approach for the teaching of writing.
While a common critique of corpus pedagogy has been its focus on bottom-up
approaches to language learning, Charles presents a model that integrates
top-down and bottom-up processing that moves learners beyond the lexicogrammar
of individual sentences to rhetorical features of the discourse. The 49
international graduate students responded quite positively to the approach and
noted the affordances the corpus approach provides for the teaching and
learning of academic writing. In closing, Charles presents a three-stage
process that she believes will transition students from corpus awareness to
corpus literacy and finally, corpus proficiency.

Another chapter detailing a corpus-based pedagogical approach is presented by
Bernhard Kettemann in Chapter 3. “Tracing the emo side of life: using a corpus
of an alternative youth culture discourse to teach culture studies” presents
an approach for the teaching of a particular alternative discourse in a
university-level Cultural Studies course. Students displayed motivational and
engagement gains and the corpus-based approach was claimed to advance
student-centered learning while also providing a valued alternative to
traditional theory-based frameworks and texts. Through a combination of
deductive and inductive learning, students displayed an increase in awareness
of the connection between language and culture. Kettemann asserts the value
of integrating corpus work into mainstream pedagogy but also acknowledges
challenges, e.g. text and text type selection, that must be overcome for
corpus study to succeed.

Pedagogical approaches continue with Natalie Kübler in Chapter 4 on
applications of corpora for translation and the teaching of translators.
“Working with corpora for translation teaching in a French-speaking setting”
explains limitations facing more complete integration of corpus approaches,
e.g. limited availability of parallel corpora, but asserts the potential of
corpus translation of specialized texts and the need for translators in
training to receive instruction in the basic concepts of corpus linguistics.
Kübler also writes that translators need to have the ability to construct
their own specialized corpora for particular translation tasks. The chapter
presents several classroom-tested activities for raising awareness of corpus
for translation tools and the potential learning gains.

The final chapter of section one, “IFAConc: a pedagogic tool for online
concordancing with EFL/EAP learners” by Przemyslaw Kaszubski, presents and
assesses an online concordancing program for the teaching and learning of
academic writing by university students in Poland. The IFAConc concordance
package, designed to meet the pedagogical needs of students in an EAP writing
class, was created with the learner in mind; search parameters, annotation
features, and search history interfaces were made as intuitive and
user-friendly as possible. However, the pedagogical aims of the concordance
package do not limit its versatility, as Kaszubski’s design enables many types
of inquiries into linguistic features while also making sharing, saving, and
annotating findings possible. Piloted in two classrooms and receiving
generally favorable responses, the package, Kaszubski notes, is constantly
evolving as updates and improvements are periodically implemented into the
system. The practicality of the tool and its potential for more complete
integration into an EAP writing curriculum are indeed promising.

PART II: Corpora for language learners: tools
Section 2 begins with a chapter from Anne Li-E Liu, David Wible, and Nai-Lung
Tsao titled “A corpus-based approach to automatic feedback for learners’
miscollocations” that details a method for identifying miscollocations in L2
learner writing and a means for providing immediate suggestions of proper
collocations to the user. Applying the notions of intercollocability and
substitutability, the software identifies collocation clusters that enable
identification of miscollocations and makes recommendations for corrections.
The collocation cluster and intercollocability information are shown to be
valid means of correcting miscollocations. With issues of detection and
correction seemingly resolved, the authors explain how the tool could be
integrated into an online language learner platform to be used by second
language writers.

One of the more intriguing chapters is Francesca Coccetta’s “Multimodal
functional-notional concordancing”. She notes that corpus approaches have
traditionally been employed for the analysis of written texts; however,
Coccetta’s rather novel approach shows how a spoken corpus of audio and video
texts can be organized, annotated, and exploited for language learning and
teaching. The program provides insights into the various semiotic resources
at play in the creation of meaning. Beyond detailing the multimodal
concordancer and a scalar method for annotating oral discourse, Coccetta
presents two data-driven activities for language learning. The chapter raises
interesting questions for corpus techniques and their application to oral
discourse while asserting the need for greater use of corpus approaches for
the teaching of speaking and listening.

Chapter 8 by Alejandro Curado Fuentes, “Academic corpus consultation in MT and
application to LSP teaching”, presents a sophisticated content-based machine
translation approach (CBMT) that aims to produce translations of written
English into Spanish. The n-gram based approach, when applied to a corpus of
written academic discourse, demonstrated the ability to identify a variety of
linguistic data. The system, as Fuentes asserts, improves the quality of
machine translation of specialized texts and can significantly decrease the
amount of time and cost required for translation. Fuentes further states that
the approach may be exploited by teachers of English for Specific Purposes to
teach particular specialized discourses through a contrastive corpus-based
data-driven learning approach.

Martin Warren follows in Chapter 9, “Using corpora in the learning and
teaching of phraseological variation”. Warren explains ConcGram (Greaves,
2009) and its ability to identify and display output in a manner quite
different from the more traditional keyword in context (KWIC) format. He
states that while a KWIC display features a centered node word, ConcGram
instead highlights the node as well as co-occurring words in a layout that
draws learner attention away from the node item to its surrounding
co-occurring features. The ConcGram approach is lauded for its ability to
identify three types of phraseological variation: meaning shift units
(Sinclair, 2007), collocational frameworks (Renouf and Sinclair, 1991) and
organizational frameworks. The author states that traditional n-gram focused
approaches exhibit only a limited view of variation in phraseology. Warren
suggests concgramming can serve as a tool for textual analysis, an approach
for raising learner awareness of the idiom principle, and a means for
revealing field and genre specific discourse features.

In Chapter 10, “The SACODEYL search tool: exploiting corpora for language
learning purposes”, Johannes Widmann, Kurt Kohn, and Ramon Ziai report on a
pedagogically-motivated user-friendly spoken language corpus of video
interviews of secondary school students representing 7 European languages.
Each language corpus has 25 interviews, annotated and aligned with their
transcripts. The corpora require little training, are user-friendly, and are
designed with a language learner in a secondary school context in mind.
Reflecting its focus on younger learners, the corpus is divided by topics such
as hobbies and plans for the future. The authors comment that this
topic-oriented construction differs from many traditional concordancing
programs as it allows students to focus on areas of particular interest. In
addition, the package comes with pedagogical materials to aid the teacher in
making lesson plans.

PART III: Corpora by language learners: learner language
Part III opens with a chapter from John Osborne, “Oral learner corpora and the
assessment of fluency in the Common European Framework”. The chapter details
how findings from learner corpora may be applied to the assessment of foreign
language oral production. In the project, interviews were independently rated
using the Common European Framework (CEF) standards and then analyzed for a
variety of quantitative and qualitative features such as pauses, length of
utterance, syntactic units, and information units amongst several others. The
author displays how benchmarking has the potential for automatic rating of
oral productions. While this study indexes the interviews using CEF standards,
application of other frameworks is also possible. The author does mention
several limitations but notes that automatic measurements can quickly provide
‘rough’ and useful profiles of a learner’s fluency.

Chapter 12, ''Preferred patterns of use of positive and negative evaluative
adjectives in native and learner speech: an ELT perspective'', is a
contribution from Sylvia De Cock on the patterns of negative and positive
attitudinal stance markers in native and learner speech and offers several
implications the findings have on English language teaching (ELT). Through a
contrastive analysis approach, the study identifies variation in syntactic and
collocational patterns of attitudinal markers and finds several items that
could be treated in the classroom. For example, De Cock finds native speaker
preference for evaluative adjectives occurring frequently in relative clauses
beginning with “which”. However, this syntactic preference occurs with much
lower frequency in the learner corpora. The author suggests this feature and
several others explained in the chapter could be included in ELT materials,
and activities based on the native and learner data could be successfully
integrated into the classroom.

Hilary Nesi in Chapter 13, ''BAWE: an introduction to a new resource'',
introduces the British Academic Written English (BAWE) corpus and discusses
its construction and design. The corpus consists of approximately 3,000
written university assignments compiled in response to the concern we had
insufficient information about the types of academic writing students
completed. The author details the 4x4 design matrix that was used for
systematic collection and organization of the assignments across four levels
and four broad disciplinary groups. The author notes the unique construction
of various levels and disciplines of the corpus that distinguishes the
collection from other similar corpora, e.g. the Michigan Corpus of Upper-Level
Student Papers (MICUSP) (Römer and Wulff, 2010) and the Portland State
University Corpus of Student Academic Writing (Conrad and Albers, 2008). The
corpus was annotated along several dimensions such as functional features and
genre family. The authors close with a review of several publications with
findings from the BAWE and suggest further research that may be conducted with
the use of the corpus.

Continuing with findings from learner corpora is Anna-Maria Hatzitheodorou and
Marina Mattheoudakis’s chapter, ''The impact of culture on the use of stance
exponents as persuasive devices: the case of GRICLE and English native speaker
corpora'', that compares stance and persuasive devices in a Greek learner
corpus and an English native speaker corpus. The study investigates
differences in how the two groups deploy rhetorical strategies to persuade
their reader. The research is informed by Hofstede’s (1980) model of cultural
dimensions with differences in stance markers interpreted using the framework.
One example the authors report is that Greek writers use persuasive boosters
(e.g. of course, undoubtedly) more frequently than hedges and attitude markers
while many fewer instances of boosters were found in the native writer corpus.
They also report that native writers are more likely to use hedges and
typically refrain from using boosters in their writing. Applied to the
Hofstede model, the authors suggest the difference can be explained through
Anglo-American rhetorical conventions that discourage bold statements and
instead leave space for alternative opinions. The authors detail several other
differences in the use of stance markers while offering interpretations of the
variation through the Hofstede model. The authors correctly caution against
explicit and prescriptive instruction but do suggest that L2 learners could
benefit from consciousness-raising activities that illuminate connections
between culture and writing practices.

The text closes with a chapter, ''Polishing papers for publication:
palimpsests or procrustean beds?'', from John McKenny and Karen Bennett that
compares articles submitted to journals written by Portuguese academics to a
corpus of native speaker journal articles published in the same field. The
study investigates variation in syntactic, lexical, phraseological, and
discourse features that may impact the ‘naturalness’ (p. 247) of the texts and
that may function as an obstacle to publication. The authors reveal
differences in a variety of features such as use of nominalization, overuse of
the genitive, and collocational patterns. While the authors do not advocate
stylistic norming and acquiescence to perceived native speaker norms, they do
call attention to the real repercussions possibly experienced by L2 writers
seeking to publish in international journals. Similar to other chapters, they
recommend awareness-raising activities while also advocating the value corpus
studies can have in revealing cultural differences in academic writing.

As evident in the chapter summaries, this recent publication on trends in
corpora and language learning covers a variety of issues, presents compelling
advances in corpora for numerous contexts and purposes, and raises important
questions for further research. From a corpus-based television program to
rhetorical discourse annotating and on to multimodal concordancing, the
possibilities for continued development of corpus tools and the potential for
greater integration of corpus approaches into the classroom is clearly on
display. However, several chapters lack the type of empirical evidence needed
if corpus approaches are to gain greater access into mainstream classrooms.

While the insights into learner attitudes are indeed valuable, further
research into learning gains is needed. This need for continued research is
noted in many chapters as authors consistently pose questions and present
challenges for future research to address. Also, no chapter directly speaks to
the need to train future language teachers in corpus linguistics and corpus
pedagogy; the one chapter on training dealt with translators. Nonetheless, the
book makes a valuable contribution and many of the ideas here will inspire
those seeking increased integration of corpus approaches in language learning
environments. These authors indeed push the field in interesting directions as
they move corpus approaches beyond the bottom-up approaches that characterized
earlier work in the field to more dynamic strategies.

From pedagogy to corpus tools and learner corpora analysis, this volume
coherently surveys the latest developments in corpora while also consistently
raising questions and encouraging continued research. Whether a reader’s
interest is classroom pedagogy or software developments, this comprehensive
text on new trends in the field will certainly be of value. Importantly, this
volume will appeal to a wide audience as it offers plenty to interest those
familiar with corpus approaches while remaining accessible to those new to the

Conrad, S. & Albers, S. (2008). A new corpus of student academic writing.
Paper presented at the American Association of Corpus Linguistics Conference,
Brigham Young University, Utah. http://corpus.byu.edu/aacl2008/ppt/29.ppt.

Hofstede, G. (1980). Culture’s consequences: international differences in
work-related values. London: Sage.

Johns, T. (1994). From printout to handout: Grammar and vocabulary teaching in
the context of data-driven learning. In T. Odlin (Ed.), Perspectives on
pedagogical grammar. New York: Cambridge University Press, 293-314.

Renouf, A.J. & Sinclair, J. (1991). Collocational frameworks in English. In K.
Aijmer and B. Altenberg (eds.), English corpus linguistics. London: Longman,

Römer U. & Wulff, S. (2010). Applying corpus methods to writing research:
exploration of MICUSP. Journal of writing research, 2(2), 99-127.

Sinclair, J. (2007). Collocation reviewed. Manuscript. Tuscan Word Centre,

Robert Poole is a Ph.D. student in the Second Language Acquisition and
Teaching program at the University of Arizona. His research interests include
corpus linguistics, corpus pedagogy, and discourse analysis.
Read more issues|LINGUIST home page|Top of issue

Page Updated: 27-Jun-2013

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.