Date: Wed, 15 Feb 2006 20:41:43 +0800 From: David Deterding Subject: Forensic Speaker Identification
AUTHOR: Alderman, Tony Brian TITLE: Forensic Speaker Identification SUBTITLE: A Likelihood Ratio-based Approach Using Vowel Formants PUBLISHER: Lincom GmbH YEAR: 2005
David Deterding, NIE/NTU, Singapore
The ability to identify speakers based on samples of speech is becoming increasingly important in forensics, to enable us for example to determine who a telephoned bomb threat was made by and whether recorded kidnapping demands were made by a particular suspect or not. This book considers the effectiveness and reliability of various methods of identifying and discriminating between speakers based on measurements of the second and third formants (F2 and F3) of the five long monophthongs of eleven male Australians recorded on two separate occasions, particularly by considering the Likelihood Ratio (the ratio of the probability that the two speakers are the same divided by the probability that they are not the same).
In the following discussion, /i/ refers to the vowel in 'heed', /u/ to the vowel in 'who'd' (in the book, it is shown as a centralised vowel, a ''barred-u''), /o/ to the vowel in 'hoard', /a/ to that in 'hard' (shown as an open central vowel, a ''turned-a''), and /3/ to the vowel in 'herd' (a mid central vowel).
Chapter 1 introduces the issues of forensic phonetics and then provides an overview of the book. In chapter 2, the basic principles of probabilistic forensic speaker identification are discussed and the use of formants for representing vowels is considered. Chapter 3 covers issues concerned with probability theory including the Bayesian approach to evaluating evidence and also the assessment of normality. The characteristics of the normal distribution are discussed in more detail in chapter 4, including statistical measures of deviation from normality such as skew and kurtosis (the degree to which data is clustered about the mean). In chapter 5, the vowels of Australian English are presented with particular reference to the Bernard data set of measurements of the vowels of 170 male speakers. The acoustic representation of the five long monophthongs of Australian English from the Bernard data set is evaluated in chapter 6, and then the recordings on which this study are based are described in chapter 7. Chapter 8 then presents the results of the current study, comparing the relative success of F2 and F3 of each of the five long vowels in separating out the eleven speakers, and then chapter 9 considers the implications of the study and discusses the way forward.
This is a short book packed with data. In fact, nearly half of the 143 pages consist of the full tabulated measurements of F2 and F3 of the eleven speakers and also the results for the effectiveness of each of the parameters in discriminating between the speakers. While it is highly commendable that so much detailed information is provided, and indeed the lengthy tables do allow the reader to get a real feel of the data (and also to check all the results, should one choose to), it does sometimes get a bit overwhelming, particularly in chapter 8 when a comparison of the effectiveness of each of the parameters is presented first in isolation and then in various combinations.
Sometimes one wishes that more interpretation were provided. For example, on page 44 we find that four out of five of the vowels have a positively skewed distribution for F2. But why is this so? And what is it about /u/ that makes it different from the others? Then we are told that the F2 of /o/ is bimodal (pp. 45-6) for the Bernard data. Does this mean there are two different realisations of the vowel in Australia, one fully back and one less so? Or maybe there is some kind of instability in the measurement? We learn on page 60 that the F-ratio for the distribution of F2 for /o/ is low for the eleven speakers in this study, which indicates that the between-speaker variation is relatively small but the within-speaker variation is high for the F2 of this vowel. But why? Is it perhaps related to the bimodality of the F2 of /o/? In chapter 8 (p. 66) we are shown that, when using the Aitken formula for estimating the Likelihood Ratio, a smoothing factor of 0.05 is best for the F2 of all the vowels except /u/ and a smoothing factor of 0.4 is best for the F3 of all the vowels except /u/. But what is it about /u/ that results in a need for the distribution of its F2 to be smoothed more than that of the other vowels while its F3 needs to smoothed less? All these questions seem to be crying out for further interpretation.
We might consider one aspect of the representation of the five vowels a bit further. On page 43, we are shown a scatter plot of F1 against F2 for the five vowels, and it appears that the range of F2 for /u/ is about the same as for /i/. But this is partly an artifact of the scales used: in percentage terms, a range of 1200 to 1800 Hz (for /u/) is in fact substantially larger than a range of about 2000 to 2600 Hz (for /i/). If, instead of linear Hertz scales, the plots were shown on auditory Bark scales (as is common in acoustic representations nowadays), the range of F2 for /u/ would be shown as larger than that for /i/, and this might more accurately reflect the fact that there is indeed substantial variation in the degree of fronting for /u/ in many varieties of English, including Australian.
One further issue arises with regard to the data. A substantial quantity of speech was recorded: 2 recordings on different occasions of 4 repetitions of 24 sentences, so the research is based on an impressively large set of vowel measurements. However, it is not clear why for two of the vowels the words were kept consistent, with a fixed hVd frame, but for each of the other three vowels, another phonological frame was included: for /i/, 'deed' was used in addition to three instances of 'heed'; for /a/, 'card' occurs in addition to three instances of 'hard'; and for /o/, 'board' was recorded in addition to three instances of 'hoard'. Does this not mean that the influence of the initial consonant might have increased the variation for /i, a, o/ compared to the other two vowels?
In fact, one might also question whether the general reliance on a fixed 'hVd' word shape might not substantially underestimate the degree of variation that exists in the vowels that occur in real speech data, for the degree of coarticulation from initial and final consonants may actually be quite significant.
Nevertheless, this book does present some fascinating and exceptionally valuable results in an important area of research. The data is carefully presented even if the interpretation might have been elaborated a little, the foundations for the research are well-grounded even if there are one or two questions one might ask about the data, and many people working in this field of trying to establish the theoretical and practical foundations of forensic speaker identification will find the thoughtful consideration of so many statistical issues very useful. This book undoubtedly makes a significant and important contribution to the growing body of work on forensic phonetics, and indeed many linguists who are interested in how vowels should be represented will also find it informative and interesting.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
David Deterding is an Associate Professor at NIE/NTU, Singapore, where he teaches phonetics, phonology, syntax, and Chinese-English translation.