Review of  The Oxford Handbook of Corpus Phonology

Reviewer: Sara Diaz
Book Title: The Oxford Handbook of Corpus Phonology
Book Author: Jacques Durand Ulrike Gut Gjert Kristoffersen
Publisher: Oxford University Press
Linguistic Field(s): Phonetics
Text/Corpus Linguistics
Language Acquisition
Issue Number: 30.280

Discuss this Review
Help on Posting

The Oxford Handbook of Corpus Phonology, edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen. constitutes a comprehensive and diverse collection of essays on corpus phonology, “a new interdisciplinary field of research that has only begun to emerge during the last few years” (1). Its aim, as clearly stated in the introduction, is to discuss possible ways to standardise corpus compilation, annotation and exploitation. The issue of international standardisation is present all throughout the four parts in which the book is structured. The first part, “Phonological Corpora: Design, Compilation, and Exploitation”, contains eight essays that deal with, as its name already predicts, essential concepts that need to be taken into account when building, annotating or using a phonological corpus. The first essay (Chapter 2) is basically an introductory one in that it begins by providing a definition of what a phonological corpus is and then addresses basic questions everyone building a corpus will need to answer such as how representative? How big? How is the corpus going to be preserved? The three following chapters further develop, respectively, the processes of data collection, annotation and automatic phonological transcription that are touched upon in Chapter 2. Chapter 6 focuses on cluster analysis, a method for the exploitation of speech corpora that involves statistics and linear algebra and provides useful resources including websites, books and software. In the seventh essay, the notions of corpus archives and data preservation and dissemination are examined and there is an emphasis on how these notions have changed due to technological innovation and on the need for new types of data centres. Finally, the last two chapters of Part I are concerned with formats, metadata and data formats. In Chapter 8, the authors give many recommendations and internet sources and, in Chapter 9, they concentrate on formats, especially standardized formats, of annotated spoken corpora. Part I is arranged into a chronological structure that covers the entire process of corpus compilation and use.

Part II integrates five essays that show how the use of speech corpora can contribute to research on the field of phonology and other related subfields. While the first paper (Chapter 10) is more theoretical in that it reviews some key terms such as ‘phonology’, ‘phonetics’, ‘corpus’ and ‘corpus-based approach’, the other four describe practical applications of corpora in different areas of phonological research, namely, segmental phonology in Chapter 11, post-lexical phonology in Chapter 12, child phonological development in Chapter 13 and second language acquisition in Chapter 14.

The third section of the Handbook is devoted to the presentation of some of the most prominent tools and methods in the field. On one hand, two of the eight chapters that make up Part III (15 and 21) provide an overview of a stand-alone tool each. In Chapter 15, Han Sloetjes illustrates ELAN, a multimedia annotation tool which not only accepts audios but also videos and can create ‘annotation tree structures’. The other stand-alone tool, ANVIL, is presented in Chapter 21. ANVIL allows for the annotation of audio, video and 3D motion-capture data and, most importantly, for interoperability with other tools. On the other hand, two other chapters are concerned with the description of EMU (Chapter 16) and EXMARaLDA (Chapter 20), two systems for the analysis and management of speech databases that integrate a collection of tools. The computer program known as Praat is developed throughout Chapters 17 and 18 where the authors focus, first, on how this program can be used in phonological corpus research and, second, on how Praat scripting language can make corpus building and analysis easier. Part III also includes two essays on methods for the study of phonology. In Chapter 19, Yvan Rose and Brian MacWhinney deal with methodological issues involved in spoken data compilation and analysis while they use the PhonBank project as an illustrative example. Chapter 22 encourages a web-based method in the archiving and sharing of speech corpora.

The last section, Part IV, ‘Corpora’, consists of a varied selection of corpora. There are corpora of many different languages such as English (IViE corpus in Chapter 23) French (the PFC programme in Ch. 24 and VALIBEL in Ch. 30), Norwegian (NoTa-Oslo and TAUS in Ch. 25), German (LeaP corpus in Ch. 26), Danish (LANCHART corpus in Ch. 28), Dutch (Ch. 29) and Taiwanese (Ch. 32); and of different dialects (Tyneside and Australian English in Chs. 27 and 31, respectively). Furthermore, Part IV covers both segmental and suprasegmental phonology and different areas and subareas of linguistics, for instance, sociolinguistics, dialectology, first and second language acquisition and historical linguistics.

Regarding the audience this book is intended for, the editors aim to reach a wide audience including researchers from many different fields that go from the most obvious, phonetics and phonology, to others that may not be as evident, such as language variation, second language acquisition, sociolinguistics and dialectology.


The Oxford Handbook of Corpus Phonology is the first of its kind, since no other handbook has yet been published that specifically deals with speech corpora and, in this respect, it deserves our congratulations. This may be due to the fact that it was only in the 1980s and 1990s when corpora started to develop as “tools for the linguist or applied linguist” (O’Keeffe & McCarthy, 2010, p. 5). Since then, several books have been published on corpus linguistics, for instance, The Routledge Handbook of Corpus Linguistics and Corpus Linguistics: An International Handbook. Nonetheless, most of these publications serve as general introductions to corpus linguistics and, even though they usually discuss a wide range of topics, few, if any, of those topics are analysed in depth. On the contrary, The Oxford Handbook of Corpus Phonology exhaustively addresses the application of corpus-based methods to the field of phonology.

As mentioned earlier, the main goal that the editors of this volume set out to achieve is to fuel the development of “international standards for the compilation, annotation, and analysis of phonological corpora” (1). When it comes to deciding whether they have been successful in their task, it seems that they partially have. While in some chapters standardization is deliberately considered (e.g. Ch. 9), most of them do not comment upon the issue. However, many researchers in this book are concerned with compatibility, interoperability, sharing, open access and sustainability and all of these concepts are very much related to standardization and contribute to its advancement. In fact, this connection is observed by Gut and Voormann (2017) when they say that “[t]he issues of standardization and documentation also apply to the sharing and reuse of phonological corpora” (p. 18). Moreover, it can be noticed that, when standardization is dealt with, the focus is on formats. An example of this is found in Chapter 9 where Romary and Witt make an excellent point: “In this chapter we hope to have conveyed the message that, within what could appear as an intricate jungle of standards, it is possible to identify some baseline formats, allowing one to start putting together a corpus project within some stable normative environments such as the TEI” (p. 189). Apart from that, in the chapter on ELAN, its author calls attention to the fact that “[t]here are ongoing efforts to establish a widely accepted interchange format for multimodal annotation” (p. 319).

All in all, it can be said that, even though there should have probably been more discussion of the topic of standardisation, the book succeeds in conveying the powerful message that the sharing of data is indispensable for corpus phonology to move forward. Another reason for supporting data sharing is that, as Yvan Rose beautifully emphasises, “[s]cientific competition should be about ideas, not data” (p. 274).

With regard to the structure of the handbook, it could have been better organised. On the one hand, the overall structure can be improved by placing Part II ‘Applications’ at the end. The resulting structure, i.e., Part I ‘Phonological Corpora: Design, Compilation, and Exploitation’, Part II ‘Tools and Methods’, Part III ‘Corpora’ and Part IV ‘Applications’, seems more coherent since it follows the very process of corpus compilation and use: first, the corpus is designed; second, some tools and methods are chosen; third, the corpus is built; and fourth, the corpus is used for a specific purpose. On the other hand, there is a lack of consistency in the structure of the chapters. While some of them have an introduction, a conclusion, references and acknowledgements, many chapters lack one or more of these sections. Nevertheless, one can say in the editors’ favour that consistency is not easily attained when many contributors are involved in the writing of a book.

Despite the need to improve consistency, chapters are intertwined and all contribute to a comprehensive whole. Chapters fit within the frame of the handbook and some contributors explicitly state it as in Chapter 11 “[o]ur aim within the framework of this handbook is to explore…” (p. 214) and Chapter 20 “[f]ollowing the focus of this book, this chapter foregrounds the use of EXMARaLDA for corpus phonology…” (p. 402). Apart from that, the handbook is full of cross references since some tools, methods or corpora are mentioned in several chapters. Cross referencing adds to the cohesion of the book but also makes it sound a bit repetitive sometimes.

Finally, it is necessary to comment on another positive aspect of The Oxford Handbook of Corpus Phonology. It encourages future research and opens up new horizons for the field of corpus phonology. In fact, some chapters include a final section with titles such as ‘Future outlook’ or ‘Future development’. In one of these sections, Yvan Rose points out “the need to develop our research methodologies and tools supporting them in a collegial way” in order to make open scientific standards reliable (p. 285).

A future edition of this handbook should take into consideration the suggestions made in this review since they can be very beneficial not only for the editors but also, and especially, for potential readers.


Lüdeling, A., & Kytö, M. (2008). Corpus Linguistics: An International Handbook. Walter de Gruyter GmbH.

O'Keeffe, A., & McCarthy, M. (Eds.). (2010). The Routledge Handbook of Corpus Linguistics. Routledge.
Sara Díaz Sierra has a BA in English Studies from the University of Extremadura (Spain) and a Master's degree in Advanced English Studies from the University of Salamanca (Spain). She is currently doing a PhD on Northern Irish English accent under the supervision of Professor Carolina Amador Moreno, lecturer at the University of Extremadura. Her main interests are in sociolinguistics, phonetics and phonology, corpus linguistics and dialectology.