The Cambridge Handbook of Communication Disorders examines the full range of developmental and acquired communication disorders and provides the most up-to-date and comprehensive guide to the epidemiology, aetiology and clinical features of these disorders.
AUTHOR: Peter Ladefoged and Sandra Ferrari Disner TITLE: Vowels and Consonants EDITION: Third PUBLISHER: Wiley-Blackwell YEAR: 2012
Seetha Jayaraman, Dhofar University, Sultanate of Oman
This book, written by Peter Ladefoged and revised by Sandra Ferrari Disner, contains sixteen chapters on topics ranging from the basics of speech sounds to an advanced description of acoustic features and the role of computers in studying acoustic components of speech. The chapters cover perspectives on speech production and perception and give an overview of phonatory and articulatory processes involved in the production of different categories of speech sounds, viz., vowels and consonants. The last three chapters deal with articulatory differences found in different languages around the world.
The volume provides an exhaustive list of illustrations of sounds discussed in each chapter and audio-recordings, photographs and videos of vocal tract configurations are made available on the website www.linguistics.ucla.edu./faciliti/sales/software.htm. A table lists the audio- recordings supporting the volume that are available on the website.
A chapter-wise summary follows:
Chapter 1, “Sounds and Languages”, begins with the definition of ‘sound’ and the distinction between ‘sound’ and ‘language’. It discusses how languages evolve and disappear constantly with changes in the socioeconomic conditions of people and their cultural practices. It reflects on the importance of individual sounds, different aspects of language and speech, and the role they play in our life. The chapter also describes speech sounds and sound symbols (i.e. International Phonetic Alphabet) vis à vis their orthographic representations, with an introduction to the basic components of speech, viz., pitch and loudness and their representations in a waveform.
Chapter 2, “Pitch and Loudness”, discusses ‘tones’ in terms of pitch and meaning change associated with pitch, drawing upon examples from tone languages like Chinese (Mandarin) and Cantonese. The fundamental concepts in understanding pitch levels, pitch curves and intonation, with reference to the speaker, are explained. The last section of this chapter outlines the importance of vocal folds, their position in sound production and the influence of vocal fold vibration on loudness, in general, and on English intonation, in particular.
The next three chapters (3,4 and 5) present a description of vowel features, the vowel chart, the vowel space, and acoustic characteristics that help identify vowels in spectrograms with respect to the structure of the first three formants.
Chapter 3, “Vowel Contrasts”, compares vowels across languages like Spanish, Hawaiian, Swahili and Japanese in order to bring out differences in their usage: some examples are ‘masa’ (dough) and ‘mesa’ (table) in Spanish; ‘kaka’ (to rinse) and ‘keka’ (turnstone) in Hawaiian; ‘pata’ (hinge) and ‘peta’ (bend) in Swahili; and ‘ma’ (interval) and ‘me’ (eye) in Japanese. This chapter also highlights the differences between General American English and Standard British English in their use of vowels. General American English consists of only 14 or 15 vowels, while British English consists of as many as 20 vowel sounds.
Chapter 4, “ The Sounds of Vowels”, gives an account of both the acoustic characteristics of vowel quality and formant patterns in spectrograms as evidence for vowels. There is a detailed explanation of the interplay between the first two formant values and the vowel space. When the pitch changes associated with vowel changes are plotted in a graph with F1 and F2 (frequencies, as we hear them in different languages), the resultant figure is a triangle. Given that the auditory space for the three possible vowels /i/, /a/, and /u,/ the vowel space in the graph shows a triangular shape. With languages having 5 to 7 vowels, it is possible to have an equally symmetrical triangular shape when we plot F1 vs. F2; this same shape for any language provides evidence of a relationship between vowel quality and formant frequencies.
Chapter 5, “Charting Vowels”, continues the discussion on formant analysis and charting of vowels through the first two formants, comparing the five vowels of Spanish with those occurring in different accents in English. The relative vowel space plotted for the Spanish vowels /i, e, a, o, u/ is compared with that of English. There is a tendency to replace diphthongs with their corresponding monophthongs in some North American accents. With the exception of the vowel in words like ‘bird’, the third formant is not significant in the description of vowels in General American English.
The next chapter, “The Sounds of Consonants”, provides an introduction to consonants and suggests that there is no significant difference in consonant articulation between British and American varieties of English. The phonetic symbols used and the articulatory and acoustic features of consonants are described. This chapter provides background information on different classes of consonants, viz., stops, approximants, nasals, fricatives and affricates. Interpretation of spectrograms with respect to both voiceless and voiced consonants is explained as well. The first three formant frequency values, their levels, formant transitions for stops, nasals and approximants, and additional spectral cues which help in the identification of individual consonants are illustrated with examples from General American English and BBC English.
Chapter 7, “Acoustic Components of Speech”, analyzes formant frequency, amplitude and pitch, combining and varying their auditory correlates of voicing and voicelessness. Speech synthesis is also discussed, as well as and the relationship between acoustic variables in the waveform, which are illustrated for the English word ‘bird’.
Chapter 8, “Talking Computers”, continues with the topic of synthesizing speech sounds, with phonetic transcription being the focus of the last part of this chapter. Two approaches to speech synthesis are suggested: parametric synthesis, where a computer calculates acoustic parameters like formant frequencies from the waveform or joins sound segments to make new sentences; the concatenative approach, in which large sections of speech are stored and subsequently joined together. The problem with the first approach is that we do not know the rules of joining one sound to another. The second approach is useful in synthesizing recordings of telephone numbers and reproducing them for providing pre-recorded information. The computer uses a mathematical technique called Linear Prediction Coefficient (LPC) analysis, which uses LPCs, or a set of numbers that represent everything about voice quality except its fundamental frequency or pitch. A detailed account of LPC analysis is also given in Ladefoged (1996). Another system called Pitch Synchronous Overlap Add (PSoLA) is also employed either by lowering or raising the pitch of the original recording or by recording the variation in duration. The last section of the chapter deals with studying segmental errors when using Text To Speech (TTS) systems in intonation. Spelling out all abbreviations and numbers using IPA symbols is a prerequisite in TTS.
Chapter 9, “Listening Computers”, contains an account of the way sounds are recognized and displayed on a computer. The chapter illustrates the spectral representation of the first three formants in the word ‘August’. Identifying individual sounds with spectral cues is another dimension viewed in this chapter. The author acknowledges Fred Jelinek’s contribution to speech recognition and lists out the stages involved in the speech recognition system. He also considers the term ‘cepstral coefficient’, which refers to measures of spectral slices stored as a number and reflects the rise and fall in the amplitude of F1, F2 and F3 in a spectrum or spectral curves. Computers also use the Hidden Markov Model (HMM), which is a representation of a sequence of speech events.
Chapter 10, “How We Listen to Speech”, deals with different ways of listening for phonetically confusable sounds that impede intelligibility. A confusion matrix for syllables with different initial consonants and noise levels is shown on a table. The premise of the table is the way these sounds are heard by a set of listeners. The confusion matrices tell us the level of confusion and the degree of similarity between the sounds using the syllables ‘pa’, ‘ta’, ‘ka’, and so on. The higher the number of correctly heard syllables, the less confusion there is. Perceptual differences are calculated using 16 sets of syllables. This chapter also reports the results of an experiment conducted with the words ‘bad’ and ‘bat’ (voicing contrast) to study variation in perception. This is the only chapter that provides sources for further reading on the topics discussed.
Chapter 11, “Making English Consonants”, deals with the physiology of the vocal apparatus and the articulatory terms associated with the description of place and manner of articulation of consonants in general. The table of IPA symbols of English consonants is presented with a brief description of each class of consonants.
Chapter 12, “Making English Vowels”, describes the anatomy and physiology of vocal organs and the muscles controlling the movements of the tongue in the production of vowels. There is an interesting account of Melville Bell’s symbols, as given in his Visible Speech (1867), representing vowels in English. The position and shape of the tongue and palate in the production of vowels relating to the vowel diagram are analyzed in detail.
Chapter 13, “Actions of the Larynx”, talks about the important role played by the larynx, pharynx, vocal folds, and cartilage and the changes they bring to the quality of sounds (viz., voiced and voiceless sounds). Voicing and aspiration are two important features in the production of stop consonants. An important feature among these is aspiration and Voice Onset Time (VOT), which vary amongst languages. The interval between the release of a stop and the beginning of the following vowel is called Voice Onset Time (VOT). In English VOT is 50-60 milliseconds (ms) for /k/ and slightly less for /t/ and /p/, while in Spanish the VOT for /k/ is about 20 ms and even less for /p/. It is interesting to note that Germanic languages like English, German and Danish have comparatively longer VOTs. In Romance languages like French and Spanish, there is no VOT of voiceless stops, while English and other Germanic languages have voiced stops, which contrast with voiceless stops. In terms of vocal fold vibration, glottal stop consonants like /h/ are found to be replaced by /k/ or /p/ in some dialects of British English, as well as in Hawaiian. Examples from Hindi also show the occurrence of four breathy voiced stop consonants, while Gujarati has breathy voiced vowels. The effect of creaky voice and breathy voice on Zapotec vowels is discussed briefly. Other classes of sounds discussed are ‘ejectives’, common in a few American Indian and a few African languages, and ‘implosives’, which are produced with air sucked in and found in some languages spoken in Nigeria (e.g. Owerri lagbo). The mechanism involved in producing implosives is illustrated through differences in airflow and air stream in the larynx and the vocal tract.
Chapter 14, “Consonants Around the World”, is a summary of consonants in languages. A general survey shows that there are about 7,000 languages in the world and over half of them are spoken by fewer than 10,000 people. In all, there are about 600 consonants. The chapter lists the 10 most widely spoken languages which have 100 consonants (of which, 22 occur in English). A few languages like Ewe, spoken in Ghana, use two unique bilabial fricatives. Subtle differences which exist in the production of /t/ in Wabuy (a language spoken in Australia) palatals in Hungarian, stops and six nasals in Malayalam, voiceless stops in Aleut, and bilabial and alveolar trills in Kele and Titan, respectively, are detailed. Likewise, F1-F2 transitions (palatals) in palatograms and linguagrams of the retroflex /ţ/, Polish sibilants and four sibilants of Toda (and their corresponding IPA symbols), are also discussed exhaustively. Laterals in Melpa are noted for their manner of their articulation, as they are complex in symbols, viz., voiced alveolar /l/ and voiceless velar /ł/ (dark /l/, represented by a small uppercase L in IPA). In Zulu, laterals occur as voiced and voiceless consonants and clicks occur contrastively. Nama, a language spoken in Namibia, has 20 clicks, each represented by an IPA symbol and with different meanings.
Chapter 15, “Vowels Around the World”, demonstrates the relation between vowel space and the graphic display of F1-F2. Contrasts are made between languages like Hawaiian, which has 5 vowels and only 8 consonants, and those such as Zulu, which has 5 vowels and 44 consonants. Every language is said to use at least 3 distinct vowels, viz., /i , a, u/ or /i, a, o/. About 20% of the world’s languages have 5 contrasting vowels. An interesting fact is that most languages with 5 vowels follow the same order of the Latin alphabet (a, e, i, o, u). Californian English has 15 vowels and BBC English has 20 vowels (12 long, 10 short and 6 diphthongs) with varying tongue roots. Lip rounding also plays an important part in the articulation of vowels. French has rounded vowels like /y/, as in ‘lu’ (a front, high, rounded vowel). The other rounded vowels which occur in French are /œ/ as in ‘leur’ (their), /ø/ as in ‘le’ (the), /o/ as in ‘lot’ (prize), /ɔ/ аs in ‘lors’ and /ɑ/ as in ‘las’ (tired). Swedish, Danish, Norwegian and German also have rounded/unrounded vowel contrasts. Nasal vowels versus nasalized vowels are observed in English and French, respectively, as in the vowels in the words ‘lin’ (flax), ‘lundi’ (Monday), ‘lent’ (slow) and ‘long’ (long). Phonetic differences in vowels are observed with distinctions in voice quality, as in !Xóō vowels (a Bushman language spoken in the Kalahari desert) or tense-voiced vowels in Mpi (spoken in Northern Thailand).
The last chapter in this volume, “Putting Vowels and Consonants Together”, summarizes vowels and consonants, puts them together as ‘utterances’, and talks about the speech continuum in terms of duration and intelligibility. It is a common observation that slips of the tongue, which interchange the sounds of syllables, occur in speech. The other aspects discussed are writing systems and sounds, tones and languages like Chinese (Mandarin) and Cantonese. The role of IPA in representing /r/ and its variants in languages other than English, contrasting sounds, and so on, are emphasized. In all, 106 distinct symbols for segments (78 consonants and 28 vowels), excluding sounds like ejectives and diacritics, are represented in the IPA chart provided. Sounds are also transcribed using symbols like )( (not an IPA symbol) for ‘hiss’ or ‘sing’. The totality of features required to describe a language at a glance is shown in a single table (Table 16.2 on page 196).
The book is an excellent introduction to the basics of speech sounds. The number of books available on phonetics is innumerable, but “Vowels and Consonant” is undoubtedly one of the best books on the basics. It is a good example of how complex topics like acoustic phonetics, speech synthesis, speech recognition, the physiology of speech production and sound-spelling correlation can be simplified to be accessible for beginners in phonetic studies. It requires and assumes no prior knowledge, either of phonetics or the process of speech production, on the part of the reader. Each chapter is introductory in nature and technical terminology has been used sparingly while explaining the basics of both articulatory and acoustic phonetics. The topics cover a wide range, from traditional definitions of phonetic terms and an IPA chart, to the latest trends in TTS systems used in speech technology. The last three chapters are dense and rich in content, and consonant and vowel sounds across different languages of the world (the most widely spoken) have been discussed extensively, clearly and concisely.
Chapter 6 is especially effective because it equips the reader with all the details of consonant features with remarkable clarity and precision. Chapters 14 and 15 of the volume also merit special mention due to their coverage of examples from all the distinctive sounds of a few lesser known, yet widely spoken languages. The detail in these two chapters aptly justifies the title of the book.
The volume is a valuable contribution for researchers and scholars working on consonants and vowels across different languages. It serves as a good introductory textbook for a course on phonetics. The highlight of the third edition of “Vowels and Consonants” is the demos of some Text-to Speech Systems such as videos of vibrating vocal cords, audio recordings of articulations of vowels and illustrations of IPA symbols. As stated in the Preface to the Third Edition, “The CD that had accompanied the previous edition has been replaced with a more readily accessible web-based collection of language files” (p. xv). The volume serves as a ready reference for advanced users of phonetics, as well as professionals and research scholars of language and speech. The book is of interest to teachers and would help to develop readers’ perception of speech production and their competence in spoken English. It is a ‘must have’ book that adds richness and knowledge to individuals and libraries.
REFERENCES Ladefoged, P. (1996). A Course in Phonetics, (2nd Ed.). Chicago, Chicago University of Press.
ABOUT THE REVIEWER
ABOUT THE REVIEWER:
Dr. Seetha Jayaraman is a Lecturer at Dhofar University, Sultanate of
Oman, where she teaches English language to undergraduates. Her
research interests include sociolinguistics, musicology, comparative
linguistics, and phonetics.