Review of  Learner Corpus Research

Reviewer: Jungmin Lim
Book Title: Learner Corpus Research
Book Author: Vaclav Brezina Lynne Flowerdew
Publisher: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
Linguistic Field(s): Discourse Analysis
Text/Corpus Linguistics
Language Acquisition
Corpus-based research in second language acquisition has contributed to the understanding of language development especially from the perspective of the usage-based approach; in many cases, corpus-based research has limited its scope to the investigation of language development while the pedagogical implication is not discussed and left to the readers’ own interpretation. This edited volume “Learner Corpus Research: New Perspectives and Applications” by Vaclav Brezina and Lynne Flowerdew is a timely and pedagogical contribution because it provides dedicated sections for practitioners who have been eager to see how learner corpus research can inform language teaching. This book consists of eight chapters divided into two parts. In the first part, three chapters discuss the roles of learner and task variables in a learner corpus. In the second part, invited authors reported empirical studies covering a wide range of topics in spoken and written learner corpora.


Part 1. Task and Learner Variables

In Chapter 1, “The effect of task and topic on opportunity of use in learner corpora”, Caines and Buttery open the discussion of the design issues of learner corpora by reporting their analysis of the Cambridge Learner Corpus as to to what extent topics affect lexical features, part-of-speech frequencies and subcategorization frames. They sample 408 B2-level essays of six topics (i.e., 68 essays * 6 topics) that represent distinct tasks (e.g., transactional, society, professional, narrative, autobiographical and administrative). They report some differential distribution of lexical and syntactic features across topics. For example, ‘society’ topic elicits more nouns, whereas ‘transactional’ topic involves more verb occurrences. ‘Narrative’ and ‘autobiographical’ essays include more adverbs than other prompts. On the syntactic level, they discuss that one group of topics (autobiographical, narrative, and society) elicit a greater range of subcategorization than the other group (administrative, professional, and transactional). With these findings, Caines and Buttery suggest that teachers utilize specific topics to induce learners to practice target forms. For example, if students need to practice more prepositional arguments, a teacher would deliberately choose a narrative prompt that the authors found most relevant to the construction.

The second chapter “Phrasal verbs in spoken L2 English: the effect of L2 proficiency and L1 background” by Cervantes and Gablasova uses a subset of the Trinity Lancaster Corpus that contains English learners’ responses to two interactive speaking tasks. As the title says, they look into how learners of English at three proficiency levels (B1, B2, and C1 and C2) and four linguistic backgrounds (Spanish, Chinese, Italian, and Russian) differently produce phrasal verbs (i.e., the verb and an adverbial particle). They present that the effects of proficiency and language background are statistically significant with small effect sizes. In addition, they describe that learners tend to use a few types of phrasal verbs repeatedly, which indicate the needs to build knowledge of different types of phrasal verbs. With their findings, Cervantes and Gablasova recommdendthat teachers introduce more academic and subject-specific phrasal verbs as students’ proficiency develops. They also point out that the depth of phraseological knowledge needs to be developed because often times learners deploy only a limited range of meaning of polysemous phrasal verbs.

Chapter 3, entitled “Investigating the effect of the study abroad variable on learner output: A pseudo-longitudinal study on spoken German learner English” by Götz and Mukherjee is a cross-sectional study that investigates how study-abroad experience influenced fluency, accuracy and vocabulary development. They use an error-tagged LINSEI-German corpus and used the metadata about the speakers’ study-abroad experience to formulate an independent variable at three levels (none, 1-12 months, and over 13 months). For the dependent variables they compute fluency measures that represent temporal fluency (e.g., unfilled pauses, mean length of run, and speech rate) and strategies relevant to fluency (e.g., frequency of discourse markers and small words). In terms of accuracy, they count verb-tense errors and article errors using the error-tagged corpus. Lastly, they use the error-tagged corpus to examine lexical competency (e.g., number of words per interview and lexical error frequency). Using a series of general linear models, they reveal that students with longer study-abroad experience produce more fluent speech; however, findings for accuracy and vocabulary are inconclusive. They interpret the findings as the benefits of study-abroad may not be captured in linguistic measures. This chapter, however, does not explicitly discuss teaching implications.

This first part of the volume considers the effects of task and learner variables. While Chapter 1 controls learner variables and focuses on exploring the task variables, other chapters shed light on the learner factors by either setting those as an independent variable (Chapter 2) or focusing on a particular learner group (Chapter 3). Chapters 2 and 3 control task variables in order to achieve their research goals on learner variables. Thus these chapters, using different dataset and exploring different task and learner variables, are great examples for researchers who are interested in how such variables play roles in learner language.

Part 2. Analysis of Learner Language

Chapter 4 “Disagreement in L2 spoken English: From learner corpus research to corpus-based teaching materials” by Gablasova and Brezina reports a study that uses the subcorpus of TLC that is used in Chapter 2. Problematizing the lack of research in lack of research in socio-pragmatic skills, especially dispreferred speech acts, they investigate agreement-plus-disagreement construction (e.g, yes but, see your point but, true but and okay but). This cross-sectional study describes the frequency of disagreement act and strategies to achieve this speech act in different proficiency levels. The researchers report that advanced learners produce more disagreement constructions than the intermediate and low-intermediate groups. In addition, advanced learners utilize various strategies to communicate disagreement while lower-intermediate learners show more limited range in strategy use. Based on the findings that the quantity and range of disagreement construction changes as proficiency develops, Gablasova and Brezina provid two corpus-driven activities that can be used to learn how to broaden linguistic repertoires to express disagreement. These exemplary activities inspire the readers to apply similar approaches to teach other linguistic resources for particular speech acts.

Molenda, Pęzik and Osborne, the authors of Chapter 5 “Self-repetitions in learners’ spoken language: A corpus-based study” report a study that focuses on repetition in spoken data. They operationalize that repeating in interaction is to earn some time during interaction (i.e., repeat), and to reinforce some information (I.e., repetition). Focusing on the self-repetition which can include both repeat and repetition, they compare native and nonnative speakers’ use of repetition and how proficiency play a role in the use of repetition. They analyze a spoken component of PELCRA Learner English Corpus, a corpus dataset of spoken output by Polish learners of English whose proficiency ranged from low intermediate to high (CEFR A2 to C2 levels), in comparison with British National Corpus. They report a clear overuse of repetition in the nonnative corpus. In term of the role of proficiency in repetition use, advanced learners of English use dramatically more repetition than the lower level learners. Their pedagogical suggestion include that teachers can focus on reducing repetitions within lexical bundles.

Chapter 6 “Corpus-driven study of information systems project reports” by Miller and Pessoa shifts focus from structures to functions. This study investigates the functional categories in native and nonnative speakers’ project reports compared to a reference corpus. For the study, they build small corpus that included the authentic project reports by 12 native speakers and 35 advanced nonnative speakers' project reports; and use Freiburg-Brown corpus as a reference corpus. Nonnative speaker data are also divided into two groups according to their grade levels (17 junior and 18 senior students). They find that project reports display different distribution of rhetorical functions from the reference corpus, indicating genre-specific language. Nonnative speakers' reports showed similar patterns to the native speakers' reports in terms of rhetorical functions, except that senior students used significantly fewer reporting functions than native speakers' reports. With the findings that nonnative speakers are able to follow the genre conventions, Miller and Pessoa reiterate the importance of using model texts when providing assignments.

Chen’s study, Chapter 7 “Beyond frequencies: Investigating the semantic and stylistic features of phrasal verbs in a three-year longitudinal study corpus by Chinese university students”, looks into the phrasal verbs, which Cervantes and Gablasova discussed in Chapter 2, from a different angle. Chen builds a longitudinal corpus that consist of six essays composed by 130 Chinese learners of English over three years. While there is no linear increasement of the amount of phrasal verb use over three years, learners produce new phrasal verbs that they have not used in the previous years. In addition, learners become more sensitive in selecting phrasal verbs appropriate for registers. When it comes to the phrasal verbs with multiple meanings, Chen found positive change from year 1 to 3 while there is a drop on the second year. The chapter provides teaching implications including the needs of focusing on register-dependent phrasal verbs and the potentials of using corpus-driven activities that make learners to conduct simple and guided search for phrasal verbs.

The last chapter “Figurative language in intermediate-level second language writing” by Paris compares metaphors in written texts of two proficiency levels. Manually reading and coding the metaphors on a small corpus of written texts by 52 low-intermediate and intermediate French learners of English, Paris identifies five categories that explain learners’ use of metaphors: overextension, L1 transfer, personification, idiomatic expression, and creative metaphors. Results include that in low-intermediate students’ essays, the first rank of the figurative language is the outcome of L1 transfer (45%) and one-third was acceptable figurative language (e.g., cases of idiomatic expression and creative metaphors). In the intermediate students’ essays, idiomatic expressions rank the highest (44%) followed by L1 transfer (39%). The author emphasize the explicit instruction on the cross-linguistic influences, which could be applicable when students share their first language.

The second part of the edited volume provide empirical studies that examine a wide range of linguistic aspects of learner language. Interestingly, these corpus studies reveal that learners’ language development may not show a linear progression. For example, findings from Chapters 5 and 7 show dynamic development, including progression and regression, over time.


As the title says, this edited volume offers fresh perspectives and applications about learner corpus research to readers. In terms of the new perspectives, I would highlight the books’ range and reporting practice. Brezina and Flowerdew, the editors, balance the chapters in a way that readers can explore the full range of L2 corpus research. For example, four chapters discussed spoken language (Chapters 2-5) while the other four reported studies on written language (Chapters 1, 6-8). Some studies use large corpus (e.g., Cambridge Learner Corpus in Chapter 1, Trinity Lancaster Corpus in Chapters 2 and 4) and the others collected learner language to investigate their research questions (Chapters 6-8).

Another significance of this volume is its excellent reporting practice. All contributors give detailed description for the methods sections, which allow readers to replicate research as the field of applied linguistics and second language acquisition advocate (e.g., Norris, Plonsky, Ross, & Schoonen, 2015). For example, studies consistently report effect sizes when using inferential statistics. Given that inferential statistics with large sample size can detect even small effects statistically significant, having effect sizes in the findings can help readers interpret findings accurately. It is also noteworthy that results sections present graphs that clearly visualize and summarize findings.

On top of the great quality, the most valuable contribution of this volume, from a teacher’s perspective, is the designated pedagogical implication section for each chapter. While some corpus-based research would not aim to provide teaching implication, readers whose primary interest is on how to apply research findings to improve instruction would appreciate the authors’ explicit statement about what practitioners can benefit from research studies. The most explicit pedagogical implication, amongst many great suggestions throughout the volume, can be found from Chapter 4 where authors provided sample corpus-based activities ready for teaching some English pragmatics.

In spite of its extensive coverage high quality, this book has a few areas that could have made this volume even more accessible. First, it would have been even more informative if some chapters had discussed relevant studies in instructed second language acquisition. For example, Part 1 of this volume could have been connected to some recent works in second language writing that found the effects of genre on the linguistic features covering complexity, accuracy and fluency (e.g., Yang, Lu, & Weigle, 2015; Yoon & Polio, 2017) and cognitive task-based research that investigated the effects of manipulating cognitive task complexity within a genre on spoken and written language (e.g., Kormos, 2011; Révész, Kourtali, & Mazgutova, 2017). In addition, findings from chapter 3 could have been discussed in line with some recent works that reported different patterns that pseudolongitudinal and longitudinal analyses provided when identical data was analyzed to explain the developmental pattern of phraseological competence, including phrasal verb knowledge (e.g., Bestgen & Granger, 2014). As such, there were some places that recent research in SLA could be discussed to provided more synthesized view. Another suggestion could be to provide a comprehensive overview of the book that explains how different studies connect to each other. While these suggestions could improve the accessibility of this volume, it should be noted that adding more content would affect the brevity of the current version achieved.

This volume is an invaluable resource to a wide range of readers, including applied linguists and teachers, who are interested in the learner language and corpus-based research. Researchers can use this book to find exemplary studies with great reporting practices. Especially appreciating readers would be practitioners who have been longing for empirical evidence as to whether their instinct about learner language can be empirically studied, and if so what they could do with those interlanguage patterns.


Jungmin Lim is a PhD candidate in Second Language Studies program at Michigan State University. Her research interests are in the areas of second language writing, language assessment, and computer-assisted language learning.

