Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

New from Wiley!


We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Review of  Corpus Linguistics for Vocabulary

Reviewer: Thi Ngoc Yen Dang
Book Title: Corpus Linguistics for Vocabulary
Book Author: Pawel Szudarski
Publisher: Routledge (Taylor and Francis)
Linguistic Field(s): Applied Linguistics
Text/Corpus Linguistics
Issue Number: 29.3257

Discuss this Review
Help on Posting

Szudarski’s (2018) book, ‘Corpus Linguistics for Vocabulary: A Guide for Research’, was written with the aim to introduce a range of corpus-based tools and procedure to students, teachers, and language practitioners who have little or no experience in corpus analysis but would like to use corpora to explore different aspects of vocabulary use and research.

The book was divided into ten chapters. Chapter 1 introduces the definition of a corpus, key factors in designing a corpus, and annotation. It also points out the strengths and limitations of corpus linguistics as a research tool as well as presenting various types of corpora.

Chapter 2 describes different kinds of corpus analysis (frequency analysis, concordancing, word lists, cluster (N-grams) analysis, and keyword analysis) and types of statistical tests used in corpus linguistics (log-likelihood, t-score, mutual information, and type-token ratio). The chapter also emphasizes the importance of combining quantitative and qualitative types of corpus analysis.

Chapter 3 introduces the area of vocabulary research by pointing out the significant role of vocabulary in language use and explains key terms in vocabulary research (e.g., word, vocabulary, lexis, word form, lexeme, lemma, word family, lexical items, and vocabulary knowledge). It also presents examples of issues in vocabulary research which can be studied with corpus-based analysis (e.g., polysemy of words, synonymy as a lexical relation, metaphoricity and indiomaticity of words, and register variation).

Chapter 4 focuses on frequency by explaining how analysis of word occurrences in corpora reveals the differences between spoken and written vocabulary and the proportions of function words and content words in a language. The chapter also discusses the usefulness of frequency in classifying vocabulary and creating lexical profiles of individual tests as well as suggesting how frequency-based information can be effectively employed by language teachers and material developers.

Chapter 5 deals with phraseology and formulaic language. It shows how corpus-based analyses can be employed to identify and categorize different kinds of phraseological units. The chapter also presents how corpora can be used to examine phraseological patterns and meanings.

Chapter 6 presents direct and indirect applications of corpus linguistics in the process of teaching vocabulary. In terms of direct applications, corpora are used to guide the design and development of syllabuses, textbooks, reference books, dictionaries, pedagogical wordlists, online resources, and teaching resources. In terms of indirect application, corpus data was used for data driven learning in which learners themselves engaged in corpus analysis.

Chapter 7 focuses on learner corpora. It describes the characteristics of learner corpora and types of methodologies that can be applied to analyze learner data. The chapter also presents different ways in which learner corpora can be used to examine L2 learners’ lexical competence and develop teaching materials and assessment tools.

Chapter 8 highlights the importance of specialized corpora in vocabulary research. It presents characteristics of specialized corpora and the way to use these corpora to investigate specific uses of vocabulary and lexical variation in a range of areas (English for Academic Purposes, English for Specific Purposes, translation studies, literary linguistics, stylistics, and sociolinguistics).

Chapter 9 examines the use of corpora to investigate vocabulary at the level of discourse. The chapter presents the benefits of combining corpus and discourse approaches in the analysis of the uses of words and phrases. The chapter also discusses the importance of discourse oriented corpus research for investigating the role of lexical features in discourse construction as well as demonstrating how to use corpus techniques to analyze the pragmatic functions of vocabulary use.

Chapter 10 ends the book with a summary of corpus-based vocabulary research and directions for future projects. It points out the areas in vocabulary research that are worth investigating with the assistance of corpus techniques. Then, it presents a selection of ideas for corpus-based projects that readers can undertake to examine different aspects of vocabulary.


Corpus techniques have been widely used to explore different aspects of vocabulary research. However, the widespread use of corpus linguistics in vocabulary does not mean that the terms and techniques in corpus linguistics are fully or widely understood by language teachers and researchers given the highly technical nature of corpus linguistics (O’Keeffe, McCarthy, & Carter, 2007). Szudarski’s (2018) book, ‘Corpus Linguistics for Vocabulary: A Guide for Research’, provides its target audience with an excellent introduction to a range of corpus-based tools and procedure to investigate different aspects of vocabulary use and research.

What distinguishes this book from other publications in corpus linguistics, to the best of my knowledge, is the fact that it is the first publication that has specifically focused on vocabulary. Throughout ten chapters of the book, Szudarski has covered the most important issues in vocabulary research and skillfully linked them with corpus linguistics. The book begins with a brief overview of corpus linguistics (Chapters 1 and 2) and vocabulary research (Chapter 3). Then, it presents in detail how corpus-based tools and techniques are used to investigate various aspects of vocabulary research: frequency (Chapter 4), phraseology and formulaic language (Chapter 5), vocabulary teaching (Chapter 6), learners’ vocabulary (Chapter 7), specialized vocabulary (Chapter 8), and discourse (Chapter 9). The book ends with proposing some potential vocabulary research projects which can make use of the corpus-based tools presented in the previous chapters (Chapter 10). The wide range of topics and the comprehensive structure of the book allow readers to clearly see the link between corpus linguistics and vocabulary research as well as being well aware of the value of using corpora to explore multiple aspects of vocabulary research.

Considering that the target audience of his book do not have much knowledge in corpus linguistics, Szudarski has managed to explain corpus-based tools and techniques in a clear and accessible way. Numerous practical activities and tasks with step-by step instruction and answer keys were designed to familiarize readers with different aspects of corpus analysis. The author also presents exemplary studies to show how the tools and techniques have been applied in actual research. Importantly, Szudarski does not just stop at presenting the tools but also providing readers with ideas for future research projects. This is particularly useful for those who are new to corpus-based vocabulary research.

Another remarkable point of this book is the information presented in the chapters. Throughout the book, Szudarski has made a great effort to provide readers with the most updated information in the field of vocabulary and corpus linguistics. This is particularly useful for not only novice researchers but also researchers with experience doing corpus-based vocabulary research. Moreover, when presenting the tools, Szudarski intentionally introduces free available web-based resources. These resources are very useful for teachers and researchers, especially those who are under financial constraints.

However, it might be more useful if the following issues are considered in further editions. To begin with, instead of presenting only the chapter titles, the table of contents should provide more information about the main sections of each chapter. This would provide readers with a better overview of the book’s structure, which would allow them to see the links among chapters easily and quickly track down the desired information. Similarly, a short introduction at the beginning of each chapter which provides the map of the chapter would make the book more reader-friendly. Additionally, at the end of each chapter, apart from a list of references cited in the chapter, it may be helpful to add another section which lists three to five key further readings together with some brief sentences about the information that readers may find useful from these references. This would provide beginning researchers with a better orientation in the selection of their reading materials.

In spite of these minor drawbacks, ‘Corpus Linguistics for Vocabulary: A Guide for Research’ is an excellent book which has nicely brought together corpus linguistics and vocabulary, two areas that have received growing interest from researchers in the field of Applied Linguistics. The book is definitely a valuable resource for students, teachers, and researchers who would like to conduct corpus-based research in vocabulary studies but have little or no experience in corpus analysis. Also, the book may be useful for researchers who have done some work on a certain aspect of corpus-based vocabulary research and would like to expand their scope to other aspects in this area. Together with Read’s (2000) ‘Assessing vocabulary’, Schmitt’s (2010) ‘Researching vocabulary: A vocabulary research manual’, Nation and Webb’s (2011), ‘Researching and analyzing vocabulary’, and Meara and Miralpeix’s (2016) Tools for Researching Vocabulary’, Szudarski’s (2018) book is a must-read book for researchers who are interested in the field of vocabulary.


Meara, P., & Miralpeix, I. (2016). Tools for Researching Vocabulary. Bristol: Multilingual Matters.

Nation, I. S. P., & Webb, S. (2011). Researching and analyzing vocabulary. Boston: Heinle, Cengage Learning.

O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. Cambridge: Cambridge University Press.

Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.

Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. New York: Palgrave Macmillan.
Thi Ngoc Yen Dang is a Lecturer at Vietnam National University, Hanoi. She obtained her PhD from Victoria University of Wellington. Her research interests include vocabulary studies and corpus linguistics. Her articles have been published in Language Learning, English for Specific Purposes, Journal of English for Academic Purposes, and ITL-International Journal of Applied Linguistics.

Format: Paperback
ISBN: 9781138187
ISBN-13: N/A
Pages: 238
Prices: U.K. £ 29.99