Fri Sep 07 2018

Review: Applied Linguistics; Lexicography; Text/Corpus Linguistics: Szudarski (2017)

Editor for this issue: Jeremy Coburn

Date: 30-Mar-2018
From: Tyler Anderson <>
Subject: Corpus Linguistics for Vocabulary
Book announced at

AUTHOR: Pawel Szudarski
TITLE: Corpus Linguistics for Vocabulary
SUBTITLE: A Guide for Research
SERIES TITLE: Routledge Corpus Linguistics Guides
PUBLISHER: Routledge (Taylor and Francis)
YEAR: 2017

REVIEWER: Tyler Kimball Anderson, Colorado Mesa University


Paweł Szudarski’s manuscript serves as an introduction to the field of corpus linguistics, with a particular focus on vocabulary. The author writes with the language learner in mind, providing corpus linguistic research techniques that can be implemented in the classroom and beyond. The book likewise takes into consideration the language teacher, with the hopes of offering tools to answer central questions, such as which words should the language student be exposed to. While the focus of the book is on English language corpora, the techniques that are explained can easily be transferred to other language corpora that utilize some of the same interfaces mentioned in the book, such as Davies’ corpora for Spanish and Portuguese.

The book begins with a brief introduction, followed by Chapter 1, “What is corpus linguistics?”. As would be expected, here the reader is presented with a working definition of ‘corpus’, followed by a discussion of benefits and limitations of corpus analysis. The chapter ends with a brief evaluation of various types of corpora. Chapter 2 “Corpus analysis: Tools and statistics” then turns to the types of tools used in corpus studies, including frequency analysis, wordlists, and keywords; the chapter continues with a discussion on statistical tests included in most corpora interfaces. The author concludes the chapter by discussing the need to combine these quantitative analyses and with qualitative approaches. The second focus--vocabulary--takes center stage in Chapter 3, where the reader is provided with terminology and conceptualizations of “What is Vocabulary?”. Here the author defines ‘word families’, ‘lexemes’ and ‘lemmas’, among other relevant terms. As part of this chapter the author discusses further the utility of corpus techniques for research related to vocabulary.

In Chapter 4 the reader is presented with details regarding the importance of “Frequency and vocabulary” in corpus analysis. Several concepts (e.g. Zipf’s law and the distinction between core and advanced vocabulary) are discussed, along with how these concepts can be applied in the language classroom. Chapter 5 continues the discussion of frequency, this time with an eye on multiword units; here the reader is introduced to phraseology (the study of word combinations) and formulaic language. Several more terms are defined and exemplified, including lexical priming, colligations and collocations. Throughout the chapter, the author presents concrete practices for applying these concepts using the British National Corpus as well as the Corpus of Contemporary American English.

The author switches focus from corpus analysis to data driven learning, or the direct use of corpora in the language teaching process. In Chapter 6 “Corpora and teaching vocabulary,” Szudarski demonstrates how corpora can facilitate the presentation and acquisition of vocabulary through both direct and indirect means. One of the direct benefits of corpora use in language teaching is the ability to increase exposure to the target language. Indirectly, corpora have begun to be instrumental in the development of dictionaries and language textbooks, which have taken advantage of the natural language promoted by corpora as a substitute for the common use of vocabulary in contrived settings. This chapter also includes a list of websites that are useful for vocabulary expansion in students.

Chapter 7 “Corpora and learner vocabulary” continues with the theme of language learners and corpora, but now with a focus on what the learners themselves produce. The creation of such corpora and how to compare target language use with that of native users are discussed. In particular, Szudarski discusses the notion of studying target language lexical growth via longitudinal comparisons by means of learner corpora. Also included in the chapter is a discussion of how corpus linguistics can help in language assessment, with specific focus on the English Profile Project. The author shifts from learner corpora to other “Specialized corpora and vocabulary” in Chapter 8. As the author notes, one of the criticisms of corpus linguistics is that the content is very general. Specialized corpora represent a particular type of discourse, and thus allow for better analyses regarding specific domains. Such investigations can focus on genre (e.g. fiction vs. academic), register (e.g. speech vs. newspapers), language for a specific purposes (e.g. court trials), specialized vocabulary and academic English.

In Chapter 9 “Discourse, pragmatics and vocabulary”, the notion of discourse is defined, and the author then demonstrates how the fields of discourse analysis and corpus linguistics can be united to mutually inform investigations. As this tome’s focus is on vocabulary, the author particularly emphasizes the lexical features of discourse. The chapter concludes with a discussion of pragmatics and corpus linguistics, including how the latter can inform such areas as speech acts and semantic prosody. The book concludes with Chapter 10 “Summary and research projects”, where the author presents ideas for future research in corpus linguistics; several projects for implementation in the classroom are also included.


Paweł Szudarski’s manuscript serves as a welcome addition to the field of corpus linguistics. The book is very accessible and is written in such a way that those with minimal contact with the topic will be able to take advantage of the wealth of expertise included in this volume. Experts in the field will also find useful insights into how to implement corpus linguistics in the study of vocabulary and classroom implementation. Only rarely does the author seem to lose focus of his target audience; for example, in initial chapters he uses specialized terms that have yet to be introduced to the reader. These instances aside, the book assuredly reaches its target audience.

The title of the book is somewhat misleading, as the main purpose of the book is to provide a practical guide and introduction to those with little or no experience in corpus analysis. The title, however, makes it appear that one will find ample information on how to carry out academic research. Perhaps the use of the word ‘introduction’ in the title would have been merited. The author does, however, include ample examples of actual investigations that have used corpora as an instrument in their studies; these serve as models for future research.

Overall, there is a great flow to the book, with only minimal errata. The chapter contents are well thought out, well organized and include great section headings that aid the reader in understanding the material; each chapter likewise concludes with a brief summary of the content. Throughout the book, the author is adept at keeping the entire purpose of the manuscript in mind. A great example of this is the conclusion of Chapter 3, where the author presents examples of research questions, and then indicates that future chapters will answer each of the questions. The author also skillfully recycles information from previous chapters without making the information redundant.

Included in this book are all sections that one would expect to find in an introductory tome on corpus linguistics and vocabulary. One of the deliberate limitations of this tome is the lack of focus on languages other than English. This book can greatly inform future tomes focusing on the creation and use of corpora from other languages, which are in constant growth.

Throughout the book the author has included brief tasks which provide the reader with opportunities to put into practice the concepts being presented. These tasks serve as transitions from the presentation of theoretical information to how this information can be applied. Practical in nature, these tasks utilize some of the more recognized corpora (e.g., British National Corpus) as well as less familiar resources (e.g. Asian Corpus of English). The tasks are well thought out, practical and easy to accomplish. This being said, much more instruction on how to carry out the tasks is required. The novice that is trying to manipulate the differing interfaces of the selected corpora will simply flounder due to the lack of guidance that is provided. An additional critique of these tasks is their presentation. While each is presented in a grayed-out box, some include instructions for the task in the boxes themselves, while others are presented in paragraphs either preceding or following the box, and yet others provide no instructions at all; greater consistency is needed in their presentation.

In order to facilitate comprehension, the author provides screenshots of the corpora discussed in the chapters and used in the tasks. Some images contain text that is so small that only by the use of a magnifying glass would one be able to read the contents (i.e. Figure 2.6). In another instance, the task that is being referenced in the screenshot focuses on verbs, while the image itself shows how to look for nouns (Figure 8.1). These criticisms aside, overall the images are beneficial and aid in helping the reader understand the concepts being presented.

As mentioned previously, in the introductory chapters the author tends to present some concepts without providing enough detail to make the presentation useful. For example, in Chapter 2 the author briefly introduces ‘type-token ratios’, without adequately exemplifying the concept or how to calculate it. Of particular concern was the presentation of terminology in Chapter 3, which was at times confusing, lacking examples, and perhaps out of order. For example, the author uses the term ‘word family’ throughout the initial part of the chapter without defining it until some three pages later. While the definitions were concrete and accurate, the failure to point out to the reader that the concept will be treated in greater detail in future chapters hindered comprehension.

While most of the sections presented information using an adequate balance of both succinctness and detail, some areas appeared to have missed the mark. For example, at one point the author states that the concept of ‘lexico-grammar’ occupies a central position in corpus linguistics, but then only dedicates one paragraph to the topic (p. 75). At another point, the author discusses the potential for interaction between corpus linguistics and sociolinguistics, but subsequently only dedicates a little over a page to the topic of age- and gender-differentiated corpora; a deeper treatment would have been beneficial to the reader.

These omissions aside, Szudarski’s book achieves the goal of presenting corpus linguistics to those inexperienced in the field and indicates how corpus linguistics can greatly enhance lexical studies and lexical awareness. Of particular benefit was the discussion of lexical frequency, first presented in Chapter 2 and then again in greater detail in Chapter 4. The author adeptly recycles information from one chapter to another, as witnessed in the presentation of ‘multiword units’ in Chapter 5. Here terms such as ‘collocation’ and ‘colligation’ are contrasted and exemplified. Similarly, the presentation throughout the book of tasks, academic articles and research projects greatly enhance the future scholar in corpus linguistics. While the book will most likely not spur future investigations by experts in the field, the ideas that are presented will enable budding scholars to develop their own lines of research that implement corpora.

The appearance of “Corpus linguistics for vocabulary: A guide for research” is a much-needed addition to the field of corpus linguistics. It will serve as a reference guide to students, teachers and future researchers who have an interest in vocabulary use and development. Szudarski inspires future research using corpora and provides an accessible resource to those interested in how to use them for researching vocabulary.


Davies, Mark. 2017. El corpus del español. (Accessed 28 March 2018)

Davies, Mark. 2015. O corpus do portugués. (Accessed 28 March 2018)


Tyler K. Anderson is Associate Professor of Spanish at Colorado Mesa University, where he teaches courses in language, linguistics and second language acquisition. His research interests include language attitudes toward manifestations of contact linguistics, including the acceptability of lexical borrowing and code-switching in Spanish and English contact situations. He is currently researching the perceptions of phonetic interference in second language acquisition.

