Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


May I Quote You on That?

By Stephen Spector

A guide to English grammar and usage for the twenty-first century, pairing grammar rules with interesting and humorous quotations from American popular culture.

New from Cambridge University Press!


The Cambridge Handbook of Endangered Languages

Edited By Peter K. Austin and Julia Sallabank

This book "examines the reasons behind the dramatic loss of linguistic diversity, why it matters, and what can be done to document and support endangered languages."

Review of  Phonetic Analysis of Speech Corpora

Reviewer: Olga Dmitrieva
Book Title: Phonetic Analysis of Speech Corpora
Book Author: Jonathan Harrington
Publisher: Wiley-Blackwell
Linguistic Field(s): Phonetics
Issue Number: 22.497

Discuss this Review
Help on Posting
AUTHOR: Jonathan Harrington
TITLE: Phonetic Analysis of Speech Corpora
PUBLISHER: Wiley-Blackwell
YEAR: 2010

Olga Dmitrieva, Department of Linguistics, Stanford University


The author defines the potential audience for this book as scholars of phonetics
embarking on their first large scale project, such as master's or honors thesis,
as well as a general research audience. The stated goal is to supplement
readers' knowledge of acoustic phonetics and basic statistical techniques with a
practical guide to testing research hypotheses using speech corpora. The book is
essentially a practical introduction to phonetic data analysis using the Emu
speech database system (a set of software tools for creation, manipulation, and
analysis of speech databases) and a collection of R functions written for
handling data imported from Emu: Emu-R interface (R is a software environment
for statistical computing and graphics.) The book consists of nine chapters, a
preface followed by a list of simple instructions for downloading the necessary
software, bibliographic list and index. Each chapter, with the exception of the
introductory first chapter, is followed by a set of ''questions'' or exercises
devised to further familiarize the readers with the concepts/tools/functions
discussed in the chapter. The solutions for the exercises are supplied. A brief
overview of each of the nine chapters is presented below.

Chapter 1 addresses the importance of corpora in phonetic research and the
challenges of creating your own corpus. The chapter begins with discussion of
the advantages of speech corpora over other kinds of material phonetic research
may rely upon, such as impressionistic transcription, and then offers a brief
review of the issues related to designing your own corpus. While this
introduction is brief and fairly basic it provides helpful references which
readers may consult should they require more information on a particular issue.

The chapter ends with a summary and an overview of the book’s structure, along
with a list of corpora available for phonetic research with examples of phonetic
studies which analyzed them.

Like the rest of the book, Chapter 2 is dedicated to hands-on exploration of the
Emu system and Emu-R interface. It begins by walking the reader through the
process of starting up the Emu database tool, downloading a database, and
opening an annotated utterance from a database. It shows with detailed
illustrations how the annotation in Emu is structured and what kind of
information is available to view when the utterance is brought up. A basic intro
to R follows. This develops into a discussion of the Emu-R interface, in
particular, how to read (query) annotation labels and associated time-stamps
into R from the Emu annotation files and how to perform simple operations with
them: save them as objects in R (''segment lists''), calculate the duration of the
labeled segments.

Chapter 3 continues to familiarize the readers with the functionalities of the
Emu database tool and its interface with R and advances to the discussion of
basic signal processing capabilities of Emu system. The body of the chapter
works through the procedures for calculating, displaying, and manually
correcting vowel formants in Emu, as well as importing them into R as
''trackdata'' objects and creating formant plots. The exercises at the end of the
chapter build on the information presented in the chapter and extend the focus
to calculating intensity, zero-crossing-rate, and fundamental frequency.

Chapter 4 pursues the intricacies of annotation structures in Emu and the types
of queries that can be applied to them in Emu-R. After a brief overview of the
basic types of operators which can be used to make more complex queries within
the same annotation tier it moves on to discuss the way annotation tiers can be
linked in Emu so that queries can span more than one tier at a time. The rest of
the chapter describes the types of connections between annotation tiers, shows
how the tiers can be linked in Emu manually and semi-automatically, and how the
linked tier structure can be translated to Praat TextGrids.

Chapter 5 aims at deepening the readers understanding of the way Emu data are
treated in R as segments lists and trackdata objects, the differences between
the two, and available functionalities. It is illustrated with articulatory
movement data obtained with the electromagnetic midsagittal articulograph (EMA).

The chapter begins with an overview of the EMA recording technique and the
process of constructing the articulatory movement database. It also describes
the way the movement database is annotated and structured in Emu. It is then
shown on the example of the movement data how the basic objects necessary for
the further analysis are created. The practical demonstration is supplemented by
the discussion of the types of objects in R and the ways they differ, especially
in terms of functions that can be applied to them. The use of comparison
operators and logical vectors is demonstrated while computing mean VOT,
introducing along the way some basic descriptive statistics and tabulation
functions in R. The author also points out the need to supplement the analysis
of mean by the analysis of distribution and provides the procedure for creating
boxplots displaying the median, the interquartile range and the range for the
VOT data. The analysis of intergestural coordination is introduced with the
synchronized tongue-body/tongue tip movement plots for individual segments as
well as ensemble plots for categories of segments. Intragestural coordination,
in particular the analysis of the velocity relies on the differencing operation
discussed in the chapter. The chapter also considers an approach to articulatory
movement as a critically damped mass-spring system and uses it to test a
particular hypothesis related to the practice dataset.

Chapter 6 returns to the subject of vowel formants and formant transitions, this
time in more detail and with more complex analyses. It covers issues of
contextual influence on vowels, vowel targets, normalization, vowel reduction
and undershoot, and coarticulatory influences of vowels on consonants. It
introduces the technique of k-means used to assess the influence of the
immediate phonetic context on vowel acoustics.

The chapter discusses the idea of vowel target as first formant (F1) maximum and
shows how to find the point of F1 maximum using Emu-R functions and how to
export the established target times as annotations into Emu. It also
demonstrates two extrinsic vowel normalization techniques: transformation to
z-scores and subtraction of speaker-dependent constant, as well as
transformation to Bark scale as an intrinsic normalization technique. It shows
how Euclidean distance is used to assess the degree of vowel space
expansion/centralization and to compare the relative distance between two vowel
categories in a formant space. Plotting a histogram is also introduced.

A method of estimating vowel undershoot by fitting a parabola to the (second)
formant's trajectory and measuring its curvature is discussed as well. Another
useful outcome of this analysis is formant smoothing resulting from the
reconstruction of the formant trajectories from the parabola coefficients. The
author also offers an alternative and superior method of formant smoothing:
discrete cosine transformation, which allows to control the degree of smoothing
and provides a better fit to the contour of the formant trajectory.

The rest of the chapter is devoted to quantifying and comparing the
coarticulatory influence of vowels on neighboring consonants using locus equations.

Chapter 7 looks at electropalatography (EPG) and shows how the data obtained
with this technique can be processed and analyzed in Emu-R. At the starts a
general overview of palatography and electropalatography is provided with the
discussion of the advantages and limitations of these techniques. It is also
demonstrated how EPG data are accessed, represented, and manipulated in Emu-R.
The types of plots available for EGP data are also considered.

Most of the chapter concentrates on data reduction techniques available for the
EGP data which allow a more convenient way of evaluating the shape and position
of the contact. These techniques include ''contact profiles'' where contacts are
summed by column and/or by row, and ''contact distribution indices'', such as
anteriority index, centrality index, dorsopalatal index, and center of gravity
index. The author shows how Emu-R functions can be applied to calculate and plot
these data-reduced objects for the EPG data and used to answer simple research
questions such as compare the amount of overlap in consonant clusters and the
amount of vowel-on-consonant coarticulatory influence.

Chapter 8 addresses spectral analysis, starting with the review of the
fundamentals of spectra which include the discussion of digital sinusoids and
their components, some basic mathematics behind Fourier transform, sampling
windows and the importance of applying Hamming or Hanning windows for discrete
Fourier transform (DFT) to reduce the effects of spectral leakage, the trade-off
between time and frequency resolution in the results of Fourier analysis and its
dependence on the length of the window, interpolation and smoothing provided by
the zero padding technique, pre-emphasis and its uses, for instance, for
distinguishing between two sounds that are differentiated mostly by energy in
high frequencies. The theory is supplemented by example calculations and plots
in R. At end of the section the author shows how the spectral data derived from
the speech signal using the Emu-tkassp toolkit can handled in Emu-R. He
discusses the way trackdata objects containing spectral data are represented in
Emu-R and demonstrates basic operations that can be applied to spectral objects
and their components, for example to limit the frequency range.

The chapter also introduces a number of data-reduction techniques which allow
for a more effective comparison between different phonetic categories and shows
how they can be applied using Emu-R functions. These include computing the
spectral average, spectral sum, and spectral ratio between the spectral average
or spectral sum in certain frequency range to the total spectral energy, which
is also shown to be useful in normalizing for the possible variation in
speaker's loudness. The author demonstrates that a difference spectrum produced
by subtracting one spectrum from another one can also be used for normalizing,
as well as for distinguishing between certain phonetic categories. The spectral
slope technique which fits a line of best fit to the spectrum and reduces it to
the intercept and slope coefficients is introduced to show its uses for
differentiating among places of articulation in oral stops. The author
highlights that all of these techniques can be applied at a single point in time
as well as across the spectral slices allowing the evaluation of the changes in
the particular parameter of the spectrum through time.

The chapter also introduces the method of calculating spectral moments which
encodes some basic properties related to the shape of the spectrum such as its
mean, variance, skew, and kurtosis. The final part of the chapter deals with yet
another way of assessing the shape of the spectrum with the help of Discrete
Cosine Transformation (CDT). It is also shown how CDT method can also be
applied to signal smoothing.

Chapter 9, the final chapter, is dedicated to methods of classifying the speech
sounds. The issues discussed appear most immediately relevant to the field of
speech recognition although as the author points out probabilistic methods such
as the ones used for classification are becoming more and more important in
experimental phonetics and linguistics in general. The final goal of the
techniques described here is to separate the phonetic categories most
effectively using the least amount of contributing parameters. As a basis of
most probabilistic classification analyses Bayes' theorem and Gaussian
distribution are introduced up front. The concepts of training and testing
stages, close and open tests, supervised and unsupervised learning are explained
along the way. Data classification in one parameter/dimension space is
demonstrated and followed by the increasingly complex examples of classification
in two-dimension and multidimensional spaces. The author addresses the issues of
over-fitting the training model and correlation between parameters. It is also
demonstrated how Principal Component analysis (PCA) can be used to reduce the
redundancy of the model. The author acknowledges that time is often crucial in
phonetic research since so much of the data extracted from speech is dynamic in
nature. Here he presents a method for compressing dynamic spectral data where
DCT is applied to reduce each spectral slice to a small number of coefficients
and a polynomial is fitted to each coefficient as a function of time resulting
in 3 values representing the mean, the slope, and the curvature of this
coefficient's trajectory in time. Thus a multitude of DFT slices and their
components can be reduced to a single point in the n-dimensional space which
would serve as a basis for classification. The rest of the chapter discusses the
advantages of the classification using ''support vector machine'' (SVM) for data
that are not normally distributed and comparing its performance to the
performance of the Gaussian model in the classification of the oral stops in the
two-dimensional space of the dynamic DCT parameters.


The book undoubtedly succeeds entirely in its goal to provide an accessible and
effective practical introduction to using Emu speech database system and Emu-R
functions to analyze phonetic data. It is written in a clear and accessible
language and the topics are introduced in a coherent and easy to follow manner
with the complexity of the material gradually increasing from the beginning
towards the end of the book. Even rather complicated concepts are made easy to
understand with an exceptional use of analogy and a commendable restraint from
going into too many mathematical and technical details. What I particularly
appreciated about the organization of the book is that it is structured not
around the features of Emu system but rather around the types of phonetic
analysis that most students/researchers are likely to get involved in: vowel
acoustics, formants, and formant transitions; normalization; articulatory data
analysis, spectral analysis. I also found it very helpful that the functions and
commands introduced in the previous chapters were often repeated in the
following chapters. The use of graphic devices is superb throughout.

However, the title of the book seems somewhat misleading: it suggests a certain
breadth of the scope and implies that the discussion will concentrate around
using already available phonetically annotated corpora to answer research
questions, while this is only briefly touched upon in the text. Since the book
is very clearly focused on instructing the reader in the uses of one particular
system for phonetic analysis of their own speech recordings it appears that
something along the lines of ''A practical introduction to phonetic data analysis
using the Emu speech database system'' would be more appropriate.

It should also be mentioned that the book limits itself to a very well defined
area, mostly ways of extracting data from already annotated corpora. (This is by
no means to underestimate the subject: it covers an impressive range of methods
and techniques and will without a doubt be a great resource for phoneticians.)
The issues preceding the data extracting -- such as development of the
hypothesis, experimental methods, and construction of the corpus, including
annotations -- are given only cursory attention, as are the statistical analysis
and the interpretation of the results. Overall, I do not take this to be a
drawback, since obviously it is not a book on linguistic research methods,
although in places it would benefit from a slightly more detailed exploration of
the linguistic background of the analyzed data. A brief statement about the
implications of the patterns uncovered in the data could also make the exercises
more exciting.

A few minor issues: on a couple of occasions the practical exercises were
difficult to complete due to bugs in the Emu system. I understand this to be a
developing program and I am sure these problems will be soon resolved. There are
also a couple of places in the book where the text refers to the elements of a
table or a graph in ''bold'' and this highlighting is actually absent.

To sum up, this is a well-written, well-structured, easy-to-follow workbook
which boasts an excellent set of practical exercises and demonstrations and
covers a wide range of techniques. Overall, those readers who have a basic
background in phonetics and statistics and are prepared to work their way
carefully through this book will be greatly rewarded with its informativeness
and effectiveness. While it may be of a particular interest to researchers and
students looking for an alternative to Praat and Praat scripting in phonetic
data processing, the book will be a valuable addition to the list of readings in
any class on research methods in linguistics, as well as an excellent main
reading for a more specialized workshop or seminar.


Boersma, Paul & Weenink, David (2010). Praat: doing phonetics by computer
[Computer program]. URL

The Emu speech database system (Version 2.3), URL

R Development Core Team (2007). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN
3-900051-07-0, URL

Olga Dmitrieva is a PhD candidate in the Department of Linguistics at Stanford University. Her main research interests are in phonetics and phonology. Her work addresses issues of phonetics-phonology interface, functional considerations in language typology, sound change, and language interference. She is currently working on a crosslinguistic study of the perception and production of consonant length in relation to the typological distribution of geminate consonants.

Format: Hardback
ISBN: 1405141697
ISBN-13: 9781405141697
Pages: 424
Prices: U.S. $ 124.95

Format: Paperback
ISBN-13: 9781405199575
Pages: 424
Prices: U.S. $ 69.95