LINGUIST List 16.2246

Sun Jul 24 2005

Review: Textbooks/Phonetics/Comp Ling: Coleman (2005)

Editor for this issue: Megan Zdrojkowski <>


        1.    David Deterding, Introducing Speech and Language Processing

Message 1: Introducing Speech and Language Processing
Date: 24-Jul-2005
From: David Deterding <>
Subject: Introducing Speech and Language Processing

AUTHOR: Coleman, John
TITLE: Introducing Speech and Language Processing
SERIES: Cambridge Introductions to Language and Linguistics
YEAR: 2005
PUBLISHER: Cambridge University Press
Announced at

David Deterding, NIE/NTU, Singapore


This book is an introduction to two separate but related areas: speech
analysis and language processing. It aims to provide a straightforward
introduction to these two topics, suitable for readers with some knowledge
of phonetics and grammar but little or no background in the computer
analysis or manipulation of speech and language, and it provides an
introduction to such techniques as digital filtering, linear predictive
coding, deterministic and non-deterministic parsing, and Markov modelling
of speech.

Most of the computer programs that are discussed in the text are provided
in an accompanying CD-ROM, including C programs for signal processing and
Prolog programs for parsing, and the reader is encouraged not just to run
these programs but to modify them so as to become fully familiar with
their structure and operation.


After an introductory chapter outlining the contents and aims of the book,
Chapters 2, 3 and 4 introduce some signal processing techniques with
illustrative programs all written in C. Chapter 2 deals with the
generation of a simple cosine wave, Chapter 3 presents basic digital
filters, and Chapter 4 covers linear predictive coding for modelling the
spectral characteristics of speech. In all these areas, the presentation
introduces the techniques step-by-step, making a commendable effort to
explain all aspects of the programs in a style that is accessible to
readers with no background in signal processing or computer programming.

In Chapter 5 the focus shifts to the use of Prolog programs to demonstrate
the implementation of finite-state machines, in order to parse and also
generate phonologically well-formed strings of phonemes in English. Once
more, the reader is taken through the example programs line by line, to
ensure that even those with no previous knowledge of Prolog can easily
understand the code and modify it if they choose.

Chapter 6 covers speech recognition techniques, including dynamic time
warping and vector quantization. And Chapter 7 deals with the importance
of incorporating probability estimates in finite-state models, including a
substantial discussion of the need for probabilistic parsing despite the
theoretical objections of many linguists such as Chomsky. Neither Chapter
6 not 7 include illustrative programs, presumably because some of the
techniques discussed, such as Hidden Markov Models, would be just too long
and complicated for an introductory book, though it is not so obvious that
a simple implementation of dynamic time warping would not have been

Chapter 8 introduces syntactic parsing, with some basic programs written
in Prolog for parsing of a very limited set of English sentences. And
Chapter 9 discusses the practical issues of incorporating probability into
the parsing algorithm, clearly demonstrating that there is no reason why
sentences that have never been uttered before should pose a problem for
probabilistic parsers, as was once claimed by Chomsky. Finally, at the end
of Chapter 9, the implementation of a simple probabilistic context-free
grammar is illustrated in Prolog.


One issue with regard to this book can be illustrated by the effort to
clarify a single line of code in the first C program that is presented:

x = (short int *) calloc(length,sizeof(short int));

Over half a page (pp. 37-38) is spent carefully explaining that this
allocates memory for an array of short integers, but it is unfortunately
probably true that many potential readers, even some with a substantial
interest in the analysis and manipulation of speech, will find some of
this explanation impenetrable.

In fact, for the line of code listed above, the text never actually fully
explains what the first part of this line does, that (short int *) ensures
the calloc function returns a pointer to a short integer, presumably
because it is assumed that going into too much detail about the use of
pointers in C is not appropriate for an introductory book on speech
processing. But this means that those readers who do not have any problems
with the technical aspects of the text might end up frustrated when the
whole of the code is not explained.

So, has Coleman got it right, in attempting to explain as much as possible
about how the code works but not necessarily going into all the details? I
think he has, and the level of detail is about right. One probably needs
to accept that it is necessary for readers to run the programs and also
manipulate them if they are to gain a reasonable understanding of the
material covered in this kind of practical textbook, and if some readers
find they cannot cope with the analysis and compilation of the code, well
so be it.

Another example of technical details that some readers may find a bit
daunting is the discussion of big endian and little endian computers (p.
32). Most of us really do not care how our computers store integers so
long as they work fine. So is it really necessary to go into these details
about how integers are stored? Well, yes it probably is. If readers are to
be able to load speech data into programs and then manipulate the data in
various ways, then they probably do need to find out if they are working
on a big endian machine (Motorola) or a little endian machine (Intel). So,
once more, distasteful as this discussion might be to some readers,
Coleman probably has got it right. Indeed, throughout the book, he always
makes an admirable effort to present the material in a style and format
that is accessible even to those with no background in computer
programming, and by and large these efforts are probably highly
successful, even if it may be necessary to acknowledge that some readers
will not be able to grasp all the concepts.

Coleman makes no claims to expertise in syntax. In fact he admits (p. 223)
that he probably knows rather less about syntax than many readers. And,
indeed, a few aspects of the syntactic models that he presents are a bit
suspect. For example, he adopts a rather traditional generative model of
English, with rules such as np --> det, adj, n (p. 232), eschewing the use
of determiner phrases that are proposed in many more recent models. But
then the first rule is ip --> np, vp, and this use of ip to represent a
sentence makes no sense when the sentence includes no inflectional
component, i, that can act as the head of the ip. It would have been
better here to stick to the traditional use of s to represent the top node
of a sentence (as indeed is done in Chapter 9, with no explanation for the
switch). But such minor quibbles miss the point: this is not a textbook on
syntax. It is an introductory text on signal processing and language
parsing, and it presents these topics exceptionally well and very clearly.

Occasionally, gaps remain in the implementation of some techniques. For
example, the use of a finite state transducer is described (pp. 144-149)
for matching simple sequences of vowels and consonants against stored
arrays of linear prediction coefficients, but many readers will wonder how
the closest match is computed between a new set of lpc values and the
stored data. Although this is (partially) resolved when vector
quantization is introduced (p. 179), thirty pages is a bit long to leave
readers pondering over this rather central issue. Furthermore with regard
to the implementation of the finite state transducer, the simple matching
algorithm only mentions vowels and fricatives, and this fails to deal with
the obvious issue that plosives are characterised by silence so that the
only way /b, d, g/ can be differentiated from each other is by means of
their transitions from and to the neighbouring sounds, something which
cannot be handled by means of single targets for each phoneme. But once
more, maybe this is missing the point: the aim of the book is to introduce
a wide range of speech processing techniques in a practical and
straightforward manner, not to go into all the details of their
implementation. And this it does extremely well, so we should not quibble
too much about some minor flaws in the simple implementations, or worry if
all of the details are not fleshed out.

Overall, Coleman is to be congratulated on this handsomely produced,
easily accessible, fascinating book which many, many students of speech
and language will undoubtedly find exceptionally valuable.


David Deterding is an Associate Professor at NIE/NTU, Singapore, where he
teaches phonetics, phonology, syntax, and Chinese-English translation.