LINGUIST List 13.1146

Wed Apr 24 2002

Sum: Vowel Normalization Procedures

Editor for this issue: Marie Klopfenstein <marielinguistlist.org>

Directory

Dom Watt, Vowel Normalisation Procedures

Message 1: Vowel Normalisation Procedures

Date: Tue, 23 Apr 2002 10:51:53 +0100
From: Dom Watt <djlw1york.ac.uk>
Subject: Vowel Normalisation Procedures

Dear linguists Some weeks ago I posted a query about vowel normalisation procedures, to which I had a good number of excellent responses. I've pasted these in below (NB: some are responses to an identical query I put out on the 'phonet' mailing list). Many thanks to all who contributed! Dom.

Here's my original query in full:

>There is a bewildering choice of vowel formant normalisation >algorithms available which aim to eliminate, among other things, vocal >tract length-related effects between speakers that result from age and >gender differences. Some are based on warpings of F1/F2 space using >frequencies of higher formants, relationships between formants and F0, >or on logarithmic transforms of values in linear Hz. Others (Barks, >mels, critical bandwidths, etc.) use psychoperceptual criteria >deriving from the non-linear response characteristics of the auditory >system. > >What, in the opinion of anyone who's had to choose one (or has >developed one themselves), would be the criterion/criteria by which >the successfulness, utility and reliability of a normalisation routine >for formant frequency measurements could be estimated? Do Disner's >(1980) conclusions about the value of a procedure lying in the degree >to which it can trade off scatter reduction against 'linguistic >realism' still hold? > >Reference: >Disner, S.F. (1980) Evaluation of vowel normalization procedures. >Journal of the Acoustical Society of America 67(1): 253-261. >

And here are the responses, in no particular order - apologies if I've left anyone out.

*************************************

>From Sylvia Moosm�ller <sylvia.moosmuelleroeaw.ac.at>:

There has been a paper on vowel normalization strategies at Eurospeech 2001, Aalborg, held by Patti Adank, Roeland van Hout, and Roel Smits: "A comparison between human vowel normalization strategies and acoustic vowel transformation techniques." In: Proceedings of the 7th International Conference on Speech Communication and Technology, Eurospeech 2001, Aalbourg, Vol. 1, 481-484.

Abstract Perceptual and acoustic representations of vowel data were compared directly to evaluate the perceptual relevance of several speaker normalization transformations. The acoustic representations consisted of raw F0 and formant data. The perceptual representations were obtained through an experimental procedure, with phonetically trained listeners as subjects. The raw acoustic data were transformed according to several normalization schemes. The perceptual and the acoustic representations were compared using regression techniques. A zscore-transformation of the raw data appeared to resemble the perceptual data.

Hope this will be of help for you, best wishes, Sylvia Moosm�ller

*************************************

>From Patti Adank <P.Adanklet.kun.nl>:

My guess would be that your criterium should depend on what you want to do with the transformation's results. Some people are interested in vowel classification (automatically or by human listeners), while others are more interested in describing within vowel category variation (allophonic variation). So for the first task you need a procedure that maximizes between vowel category variance, while in the latter case the within vowel vategory variance should be maintained primarily. I think it might be the case that you need to use different procedures for both tasks. I work in sociophonetics and am therefore more interested in maintaining the variation within vowel categories. However, most studies that evaluate vowel normalizaiton procedures focus only on the classification performance of the procedures (e.g. Nearey 1978, Syrdal 1984, Deterding 1990); there are only two studies that evaluate how well within vowel category is maintied (Hindle, 1978 and Disner, 1980), but these do not provide conclusive answers.

I am writing my PhD thesis on vowel normalization and I am comparing several 'formant-based' (i.e. some of the ones mentioned by Disner 1980, and by Terry Nearey in his 1989 JASA article) procedures, like Lobanov's z-transformation, gerstman's end-transformation, Syrdal & Gopal's bark-difference model. I am evaluating how well the 13 procedures I selected perform on both vowel classification and maintaining within category variation. I have not finished all of my research but I can give some preliminary indications if you like.

Overall, I would say that Lobanov's z-score tranformation works best for both classification and maintaining variation, followed by Nearey's logmean (CLIH4) transformation. It might be the case that Nearey's is better at maintaining variance, but I will have to find that out in the next few months. Again, these are preliminary results. I still have to deal with the fact that, usually, not all vowel categories for a certain speaker will be available, while Lobanov's procedure requires values for all these values. Nearey's might be the best option, since it needs a minimum of two categories per speaker. So, we're not there yet.

Regarding Disner's study: her conclusions seem to be still valid; it is still not advisable to compare markedly different phonological systems directly to each other, especially with procedures that put a lot of emphasis on the mean and standard deviations (like Nearey's and Lobanov's). It might be a better idea to only compare communities if you have enough values to calculate the means, differences between communities in vowel targets might be interpreteble even without normalization.

I have presented a paper at the last Eurospeech conference on this issue. Would you be interested in reading this paper?

I hope this answers your question,

Patti Adank Dept. of Linguistics University of Nijmegen The Netherlands

References: Deterding, D. "Speaker normalizaiton for automatic speech recognition." PhD thesis, University of Cambridge, 1990 Disner, S.F. "Evaluation of Vowel Normalization Procedures", J. Acoust. Soc. Amer., Vol. 67, 1980, p 253-261. Hindle, D. "Approaches to Vowel Normalization in the Study of Natural Speech", In: Linguistic Variation: Models and Methods. Ed. D. Sankoff, New York. Academic Press. 1978. Lobanov, B.M. "Classification of Russian Vowels Spoken by Different Speakers", J. Acoust. Soc. Amer., Vol. 67, 1980, p 253-261. Nearey, T.M "Applications of Generalized Linear Modeling to Vowel Data", Proceedings ICSLP 92, p 583-586. Nearey, T.M. "Phonetic feature systems of vowels." PhD thesis, Indiana Unuversity Linguistics Club, 1978 Syrdal, A. "Aspects of a model of the auditory representation of American English vowels." Speech Communication 4, 121-135 1984. Syrdal, A. and Gopal, H. A. "Perceptual Model of Vowel Recognition based on the Auditory Representation of American English Vowels", J. Acoust. Soc. Amer., Vol. 79, 1986, p 1086-1100.

*************************************

>From Rob Hagiwara <robhcc.umanitoba.ca>:

Re your normalization question, I've been working on expanding the autonormalization procedure I develop in my dissertation, where you take a full suite of vowels and calculate average F1, F2, F3 etc. frequencies (and cross-correlate them back to a particular resonating length, i.e. they should be 1x3x5 multiples to each other or closs), and then express the deviation (in a token or a class of tokens) from these averages in either %Hz or Bark-distance. The idea has caught on in a few circles, but it's difficult to operationalize in anything but a formal experimental context.

*************************************

>From David Deterding <dhdeternie.edu.sg>:

>My PhD thesis was: > >David Deterding, Speaker Normalisation for Automatic Speech Recognition, >Unpublished PhD Thesis, Cambridge University, 1990. > >and it concentrated largely on normalisation procedures for vowels. Although >I tried out all kinds of methods, including formant mappings and >whole-spectrum shifts, I really can't offer an answer. Perhaps the only >contribution I can make is some help in understanding the problem -- but >maybe you already understand the problem well enough! > >If you think it might be useful, you could obtain a copy from Cambridge >University Library. But maybe someone will provide you with a clear-cut >answer, so you won't need to access my work. > >Regards, > >David Deterding >NIE, Singapore >http://www.arts.nie.edu.sg/ell/davidd/Personal/david.htm

*************************************

>From H.M. Hubey <hubeyhmail.montclair.edu>:

>There are very good reasons why topics dealing with the complexity >of fluid dynamics should use the one very successful method. >That method is "dimensional analysis" and is used in fluid dynamics >extensively. It is the one thing that allowed Prandtl to connect >the theoretical results of fluid dynamics (the Navier-Stokes equations) >with experimental results. And since then many other dimensionless >groups have been developed and experimentally fitted to data. > >I am the only one to have used this method to derive results in >this field. It can be found in my book, Mathematical and Computational >Linguistics (Lincom Europa) and in my paper in the Journal of >Quantitative Linguistics, "Vector Phase Space for Speech Analysis >via Dimensional Analysis" VOlume 6, NUmber 2, August 1999.

*************************************

>From Dylan Herrick <herrickling.ucsc.edu>:

>I'm afraid that I don't have any answers for your question about vowel >normalization procedures. However, I wanted to let you know that I am >deeply interested in the responses you might get. You see, I am working on >a phonetic study of the Catalan vowel system at the moment, and I am >wondering how to best combine data from various speakers. > >The one (relatively) recent paper that I have seen which takes a fairly >opinionated view of vowel normalization was: > >Yang, Byunggon. 1996. A comparative study of American English and Korean >vowels produced by male and female speakers. Journal of Phonetics >24:245-261. > >Judging from your message, you have already seen this paper. As I recall, >the author mentions that he argued for the value of a psychoperceptual >approach to vowel normalization in an earlier paper >(which I have been unable to locate). If nothing else, his paper offers a >model for how vowel normalization could be done - normalizing for vocal >tract length & using mel scale (or bark... I forget) instead of Hz. I have >no idea how this relates to Disner's paper (which I have not read). > *************************************

>From Bill Labov <labovearthlink.net> ('Plotnik' is Labov's dedicated vowel formant frequency plotting program, available for download at http://www.ling.upenn.edu/~labov/Plotnik.html I had asked him earlier whether the normalization algorithm he had programmed into Plotnik was Nearey's (1977) routine, since this is the method favoured in his (Labov's) recent 'Principles of Linguistic Change, vol II: Social Factors', 2001, Blackwell - see pp157-164):

> > Yes, the Nearey log mean normalization is available in Plotnik. >The documentation gives general information about it as well as the >instructions. In the second volume of Principles of Linguistic Change, >Chapter 5, there is an account of the empirical justification for the >use of that algorithm. It's worked out very well in the Atlas of North >America, where I've superimposed 440 speakers in a single view. At the >same time, it's not an answer to the question of how speakers actually >do normalize, which operates efficiently with one or two utterances, and >doesn't need hundreds.

*************************************

>From Bill Idsardi <idsardiUDel.Edu>:

The best discussion I know of for this is Rosner and Pickering, _Vowel Perception and Production_ Oxford UP 1994, chapter 5.

Bill

*************************************

>From Mark Huckvale <M.Huckvaleucl.ac.uk>: > >I think Rosner & Pickering's book "Vowel production & perception" has >more recent data on the evaluation of normalisation metrics. You >may have looked there already.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Dominic Watt Department of Language & Linguistic Science University of York Heslington York YO10 5DD UK Tel 01904 432665 Fax 01904 432673 http://www.york.ac.uk/depts/lang/webstuff/people/dw.html %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%