Review of  New Trends in Corpora and Language Learning

Reviewer: Robert E Poole
Book Title: New Trends in Corpora and Language Learning
Book Author: Ana Frankenberg-Garcia Lynne Flowerdew Guy Aston
Publisher: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
Linguistic Field(s): Applied Linguistics
Computational Linguistics
Issue Number: 24.2625

“New Trends in Corpora and Language Learning” provides a comprehensive look at recent developments in corpus approaches for teaching and learning language. The 15 chapters were developed from presentations delivered at the 2008 Teaching and Language Corpora Conference (TaLC). Part 1 features chapters detailing current approaches to using corpora and corpus tools by language learners from contexts around the world. These chapters explain approaches that place data in the hands of the learner and report the benefits of such instruction and learners’ responses to the methods and tools. The next section discusses tools from multimodal concordancing software to a collocation feedback program that learners can employ and exploit for their language learning in addition to chapters detailing recent developments in machine translation and parallel corpora. Finally, section three includes chapters discussing insights made possible through analyses of learner corpora and the pedagogical implications of the findings.

PART I: Corpora with language learners: use
Opening the text, Yukio Tono’s chapter “TaLC in action: recent innovations in corpus-based English language instruction in Japan” details several novel corpus-based applications while reporting the popularity of corpora in Japan and the potential for its success elsewhere. Of particular interest was the description of a corpus-based TV English program that more than 1 million people watch per year. It spawned a popular children’s character, “Mr. Corpus”, and won an award for best TV program from Japan's public broadcasting center. The show enjoyed such great popularity that similarly themed iPhone applications have been produced and a corpus-based Wii game application is being developed. These success stories, Tono asserts, display the potential for and viability of corpus-based approaches in Japan and elsewhere.

The second chapter, “Using hands-on concordancing to teach rhetorical functions: evaluation and implications for EAP writing classes” by Maggie Charles, presents a discourse-analytic approach for the teaching of writing. While a common critique of corpus pedagogy has been its focus on bottom-up approaches to language learning, Charles presents a model that integrates top-down and bottom-up processing that moves learners beyond the lexicogrammar of individual sentences to rhetorical features of the discourse. The 49 international graduate students responded quite positively to the approach and noted the affordances the corpus approach provides for the teaching and learning of academic writing. In closing, Charles presents a three-stage process that she believes will transition students from corpus awareness to corpus literacy and finally, corpus proficiency.

Another chapter detailing a corpus-based pedagogical approach is presented by Bernhard Kettemann in Chapter 3. “Tracing the emo side of life: using a corpus of an alternative youth culture discourse to teach culture studies” presents an approach for the teaching of a particular alternative discourse in a university-level Cultural Studies course. Students displayed motivational and engagement gains and the corpus-based approach was claimed to advance student-centered learning while also providing a valued alternative to traditional theory-based frameworks and texts. Through a combination of deductive and inductive learning, students displayed an increase in awareness of the connection between language and culture. Kettemann asserts the value of integrating corpus work into mainstream pedagogy but also acknowledges challenges, e.g. text and text type selection, that must be overcome for corpus study to succeed.

Pedagogical approaches continue with Natalie Kübler in Chapter 4 on applications of corpora for translation and the teaching of translators. “Working with corpora for translation teaching in a French-speaking setting” explains limitations facing more complete integration of corpus approaches, e.g. limited availability of parallel corpora, but asserts the potential of corpus translation of specialized texts and the need for translators in training to receive instruction in the basic concepts of corpus linguistics. Kübler also writes that translators need to have the ability to construct their own specialized corpora for particular translation tasks. The chapter presents several classroom-tested activities for raising awareness of corpus for translation tools and the potential learning gains.

The final chapter of section one, “IFAConc: a pedagogic tool for online concordancing with EFL/EAP learners” by Przemyslaw Kaszubski, presents and assesses an online concordancing program for the teaching and learning of academic writing by university students in Poland. The IFAConc concordance package, designed to meet the pedagogical needs of students in an EAP writing class, was created with the learner in mind; search parameters, annotation features, and search history interfaces were made as intuitive and user-friendly as possible. However, the pedagogical aims of the concordance package do not limit its versatility, as Kaszubski’s design enables many types of inquiries into linguistic features while also making sharing, saving, and annotating findings possible. Piloted in two classrooms and receiving generally favorable responses, the package, Kaszubski notes, is constantly evolving as updates and improvements are periodically implemented into the system. The practicality of the tool and its potential for more complete integration into an EAP writing curriculum are indeed promising.

PART II: Corpora for language learners: tools
Section 2 begins with a chapter from Anne Li-E Liu, David Wible, and Nai-Lung Tsao titled “A corpus-based approach to automatic feedback for learners’ miscollocations” that details a method for identifying miscollocations in L2 learner writing and a means for providing immediate suggestions of proper collocations to the user. Applying the notions of intercollocability and substitutability, the software identifies collocation clusters that enable identification of miscollocations and makes recommendations for corrections. The collocation cluster and intercollocability information are shown to be valid means of correcting miscollocations. With issues of detection and correction seemingly resolved, the authors explain how the tool could be integrated into an online language learner platform to be used by second language writers.

One of the more intriguing chapters is Francesca Coccetta’s “Multimodal functional-notional concordancing”. She notes that corpus approaches have traditionally been employed for the analysis of written texts; however, Coccetta’s rather novel approach shows how a spoken corpus of audio and video texts can be organized, annotated, and exploited for language learning and teaching. The program provides insights into the various semiotic resources at play in the creation of meaning. Beyond detailing the multimodal concordancer and a scalar method for annotating oral discourse, Coccetta presents two data-driven activities for language learning. The chapter raises interesting questions for corpus techniques and their application to oral discourse while asserting the need for greater use of corpus approaches for the teaching of speaking and listening.

Chapter 8 by Alejandro Curado Fuentes, “Academic corpus consultation in MT and application to LSP teaching”, presents a sophisticated content-based machine translation approach (CBMT) that aims to produce translations of written English into Spanish. The n-gram based approach, when applied to a corpus of written academic discourse, demonstrated the ability to identify a variety of linguistic data. The system, as Fuentes asserts, improves the quality of machine translation of specialized texts and can significantly decrease the amount of time and cost required for translation. Fuentes further states that the approach may be exploited by teachers of English for Specific Purposes to teach particular specialized discourses through a contrastive corpus-based data-driven learning approach.

Martin Warren follows in Chapter 9, “Using corpora in the learning and teaching of phraseological variation”. Warren explains ConcGram (Greaves, 2009) and its ability to identify and display output in a manner quite different from the more traditional keyword in context (KWIC) format. He states that while a KWIC display features a centered node word, ConcGram instead highlights the node as well as co-occurring words in a layout that draws learner attention away from the node item to its surrounding co-occurring features. The ConcGram approach is lauded for its ability to identify three types of phraseological variation: meaning shift units (Sinclair, 2007), collocational frameworks (Renouf and Sinclair, 1991) and organizational frameworks. The author states that traditional n-gram focused approaches exhibit only a limited view of variation in phraseology. Warren suggests concgramming can serve as a tool for textual analysis, an approach for raising learner awareness of the idiom principle, and a means for revealing field and genre specific discourse features.

In Chapter 10, “The SACODEYL search tool: exploiting corpora for language learning purposes”, Johannes Widmann, Kurt Kohn, and Ramon Ziai report on a pedagogically-motivated user-friendly spoken language corpus of video interviews of secondary school students representing 7 European languages. Each language corpus has 25 interviews, annotated and aligned with their transcripts. The corpora require little training, are user-friendly, and are designed with a language learner in a secondary school context in mind. Reflecting its focus on younger learners, the corpus is divided by topics such as hobbies and plans for the future. The authors comment that this topic-oriented construction differs from many traditional concordancing programs as it allows students to focus on areas of particular interest. In addition, the package comes with pedagogical materials to aid the teacher in making lesson plans.

PART III: Corpora by language learners: learner language
Part III opens with a chapter from John Osborne, “Oral learner corpora and the assessment of fluency in the Common European Framework”. The chapter details how findings from learner corpora may be applied to the assessment of foreign language oral production. In the project, interviews were independently rated using the Common European Framework (CEF) standards and then analyzed for a variety of quantitative and qualitative features such as pauses, length of utterance, syntactic units, and information units amongst several others. The author displays how benchmarking has the potential for automatic rating of oral productions. While this study indexes the interviews using CEF standards, application of other frameworks is also possible. The author does mention several limitations but notes that automatic measurements can quickly provide ‘rough’ and useful profiles of a learner’s fluency.

Chapter 12, ''Preferred patterns of use of positive and negative evaluative adjectives in native and learner speech: an ELT perspective'', is a contribution from Sylvia De Cock on the patterns of negative and positive attitudinal stance markers in native and learner speech and offers several implications the findings have on English language teaching (ELT). Through a contrastive analysis approach, the study identifies variation in syntactic and collocational patterns of attitudinal markers and finds several items that could be treated in the classroom. For example, De Cock finds native speaker preference for evaluative adjectives occurring frequently in relative clauses beginning with “which”. However, this syntactic preference occurs with much lower frequency in the learner corpora. The author suggests this feature and several others explained in the chapter could be included in ELT materials, and activities based on the native and learner data could be successfully integrated into the classroom.

Hilary Nesi in Chapter 13, ''BAWE: an introduction to a new resource'', introduces the British Academic Written English (BAWE) corpus and discusses its construction and design. The corpus consists of approximately 3,000 written university assignments compiled in response to the concern we had insufficient information about the types of academic writing students completed. The author details the 4x4 design matrix that was used for systematic collection and organization of the assignments across four levels and four broad disciplinary groups. The author notes the unique construction of various levels and disciplines of the corpus that distinguishes the collection from other similar corpora, e.g. the Michigan Corpus of Upper-Level Student Papers (MICUSP) (Römer and Wulff, 2010) and the Portland State University Corpus of Student Academic Writing (Conrad and Albers, 2008). The corpus was annotated along several dimensions such as functional features and genre family. The authors close with a review of several publications with findings from the BAWE and suggest further research that may be conducted with the use of the corpus.

Continuing with findings from learner corpora is Anna-Maria Hatzitheodorou and Marina Mattheoudakis’s chapter, ''The impact of culture on the use of stance exponents as persuasive devices: the case of GRICLE and English native speaker corpora'', that compares stance and persuasive devices in a Greek learner corpus and an English native speaker corpus. The study investigates differences in how the two groups deploy rhetorical strategies to persuade their reader. The research is informed by Hofstede’s (1980) model of cultural dimensions with differences in stance markers interpreted using the framework. One example the authors report is that Greek writers use persuasive boosters (e.g. of course, undoubtedly) more frequently than hedges and attitude markers while many fewer instances of boosters were found in the native writer corpus. They also report that native writers are more likely to use hedges and typically refrain from using boosters in their writing. Applied to the Hofstede model, the authors suggest the difference can be explained through Anglo-American rhetorical conventions that discourage bold statements and instead leave space for alternative opinions. The authors detail several other differences in the use of stance markers while offering interpretations of the variation through the Hofstede model. The authors correctly caution against explicit and prescriptive instruction but do suggest that L2 learners could benefit from consciousness-raising activities that illuminate connections between culture and writing practices.

The text closes with a chapter, ''Polishing papers for publication: palimpsests or procrustean beds?'', from John McKenny and Karen Bennett that compares articles submitted to journals written by Portuguese academics to a corpus of native speaker journal articles published in the same field. The study investigates variation in syntactic, lexical, phraseological, and discourse features that may impact the ‘naturalness’ (p. 247) of the texts and that may function as an obstacle to publication. The authors reveal differences in a variety of features such as use of nominalization, overuse of the genitive, and collocational patterns. While the authors do not advocate stylistic norming and acquiescence to perceived native speaker norms, they do call attention to the real repercussions possibly experienced by L2 writers seeking to publish in international journals. Similar to other chapters, they recommend awareness-raising activities while also advocating the value corpus studies can have in revealing cultural differences in academic writing.

As evident in the chapter summaries, this recent publication on trends in corpora and language learning covers a variety of issues, presents compelling advances in corpora for numerous contexts and purposes, and raises important questions for further research. From a corpus-based television program to rhetorical discourse annotating and on to multimodal concordancing, the possibilities for continued development of corpus tools and the potential for greater integration of corpus approaches into the classroom is clearly on display. However, several chapters lack the type of empirical evidence needed if corpus approaches are to gain greater access into mainstream classrooms.

While the insights into learner attitudes are indeed valuable, further research into learning gains is needed. This need for continued research is noted in many chapters as authors consistently pose questions and present challenges for future research to address. Also, no chapter directly speaks to the need to train future language teachers in corpus linguistics and corpus pedagogy; the one chapter on training dealt with translators. Nonetheless, the book makes a valuable contribution and many of the ideas here will inspire those seeking increased integration of corpus approaches in language learning environments. These authors indeed push the field in interesting directions as they move corpus approaches beyond the bottom-up approaches that characterized earlier work in the field to more dynamic strategies.

From pedagogy to corpus tools and learner corpora analysis, this volume coherently surveys the latest developments in corpora while also consistently raising questions and encouraging continued research. Whether a reader’s interest is classroom pedagogy or software developments, this comprehensive text on new trends in the field will certainly be of value. Importantly, this volume will appeal to a wide audience as it offers plenty to interest those familiar with corpus approaches while remaining accessible to those new to the area.

Robert Poole is a Ph.D. student in the Second Language Acquisition and Teaching program at the University of Arizona. His research interests include corpus linguistics, corpus pedagogy, and discourse analysis.

