Tue Feb 14 2006

Sum: Measuring Vowel Duration from Spectrogrammes

Message 1: Measuring Vowel Duration from Spectrogrammes
Date: 11-Feb-2006
From: Roy Becker <>
Subject: Measuring Vowel Duration from Spectrogrammes

Regarding query:

A couple of weeks ago I posted a query concerning standards for measuring vowel duration using spectrogrammes, in phonetic and phonological studies, highlighting some of the problems this task poses. I received 4 answers, which I try to summarise in what follows:

1. Kimary Shahin (Effat College) and Zinny Bond (Ohio University), both referred me to the classical study of English vowel duration by Peterson & Lehiste from 1960: ''Duration of syllable nuclei in English'' JASA 32:693-703.

My own feeling about this important publication is that, given the time at which it was written and the accumulated knowledge and experience in acoustic-phonetic experimentation available, it more highlights problems rather than solving them. In addition, most of present-day's advanced spectrogramme-manipulation technology was unavailable, and many possible refinements of decision making procedures regarding temporal segmentation could not have been foreseen at the time of writing. Furthermore, while this study is undoubtedly a landmark in the development of acoustic phonetic research, its applicability and relevance to more abstract aspects of linguistic (phonological) theory is somewhat limited, since many of these, like syllabic weight and non-linear organisation of the utterance were not in existence. Such aspects, rather than being conditioned and controlled, were simply neutralized in that study, so it is hard to infer standards that would suit measurements relevant for less obvious phonological conditions (and purposes). Finally, the publication itself aims at reporting of a particular, albeit paradigmatically-comprehensive, research, rather than an attempt to create a standard. Therefore, by using it as a standard, a researcher (in particular a less experienced one) is likely to encounter many pitfalls neither covered nor forewarned in this publication, and may deduce inadequate solutions, for example by inaccurate analogy to the authors' general procedures.

I commented in length about this reference because, due to its rightful inclusion in the canonical reader ''Reading in Acoustic Phonetics'' (MIT Press, 1967), it is probably the most accessible paper related to measuring vowel durations, for those not involved with acoustic phonetics as primary research domain. While this paper should be considered a MUST reading for similar or related experimental research, it should be taken as some older reference point to a more recent standard, enabling the reader to 'interpolate' the gradual development of methodological standards, and to better evaluate the methodological adequacy of earlier studies at their time of writing and at present.

2. Correspondence initiated by Nora Wiedenmann ended up with some detailed answers and a text of segmentation standards from Florian Schiel (both are from the Institute of Phonetics and Speech Communication at the Ludwig-Maximilians-University, Munich). The standard, which was designed for and implemented in the annotation of the PhonDat database of spoken German, includes various brief instructions regarding problematic segment sequences. Unfortunately, due to my very limited knowledge of German, it is impossible for me to evaluate the description of this standard. Nevertheless, it seems that, as it was developed for the strict purpose of annotation of connected speech corpus, it indeed addresses many segmentation adversities, but at the same time it cannot cover specific contextual variables, which have to be controlled in elicitation-based experimental studies with theoretical implications in phonetics and phonology.

This is probably true for most other connected-speech-corpus-oriented annotation and segmentation standards, which, like the previous reference, might be easily accessible. This is because, unlike particular, theory-oriented experimental studies, the amount of annotation work in corpora requires employing a large team of annotators, for whom a written standard must be provided for team-consistency purpose. Public availability of such corpora and/or their commercial use in the speech technology industry contributes to their attractiveness and 'fame' (e.g. the TIMIT corpus of American English), but the applicability of their annotation standards for other research cases depends on and varies with the task and the purpose in each case.

3. Alice Turk (University of Edinburgh) sent to me a copy of a paper authored by her, Satsuki Nakai, and Mariko Sugahara, titled ''Acoustic Segment Durations in Prosodic Research: A Practical Guide'', to appear soon in: Sudhoff, Stefan, Denisa Lenertov├í, Roland Meyer, Sandra Pappert, Petra Augurzky,Ina Mleinek, Nicole Richter & Johannes Schlie├čer (eds): Methods in Empirical Prosody Research. Berlin, New York: De Gruyter.

This paper happens to aim precisely at the topic of my query. It both provides principles for acoustic segmentation and distinguishes between various segmental and prosodic contexts and highlighting potential pitfalls in detail, but also critically evaluates the reliability and 'relative segmentability' of these contexts. It thus provides criteria for context design, warning against mutual evaluation of incompatible contexts, or at least requires explicit justification of theoretical and methodological concepts when such evaluations are carried out.

In addition, it provides a concise description of key aspects of the experimental and methodological setting, such as control for speech rate, syntactic structure, orthographic bias, and direct influence of results by explicit instruction of the participant, among others. Many of these recommendations seem to result from the authors' experience both in active carrying of such experiments and in reviewing many other experiment reports in the history of acoustic phonetic literature, which is partially demonstrated by the list of references. Not surprisingly, the earliest referred study is Peterson and Lehiste's paper mentioned above. Reference to and evaluation of work on annotated speech corpora is also made.

While for the most experienced and knowledgeable acoustic phonetician most of the descriptions, recommendations and warnings in this paper may seem natural or obvious, they are invaluably helpful for the less experienced researchers, both those of younger age and of different disciplinary background. One will not be able to find solutions for all methodological questions in this 20-pages-long paper, including some of those mentioned in my query, but one is definitely compelled to take them into consideration in experimental design, and to explicitly defend one's own methodological treatment of adversities and incompatibilities forseen by this paper. Implementing the lessons learned from this paper can make the difference, when reporting a seemingly successful experiment, between ''preliminary encouraging tentative results'' and ''substantial reinforcement of the hypothesis''.

I would like to conclude by thanking all the contributors for their answers to this query. All mistakes are mine.

Roy Becker, Graduate Student, UCLA.

