LINGUIST List 8.954

Sat Jun 28 1997

Review: Liberman: Speech- A special Code

Editor for this issue: Andrew Carnie <>

What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Andrew Carnie at

Message 1: Review of Liberman, Speech: A special code

Date: Fri, 27 Jun 1997 09:52:26 -0500 (EST)
From: Stefan A Frisch <>
Subject: Review of Liberman, Speech: A special code

Liberman, Alvin M. (1996). Speech: A special code. Learning, Development, and Conceptual Change series. Cambridge, MA: MIT Press. 458 pages. ISBN 0-262-12192-1.

Reviewed by Stefan A Frisch <>

This book is a collection of articles representing 50 years of speech research at Haskins Laboratories undertaken by Alvin Liberman and colleagues. It is divided into ten sections, which cover a range of issues which have guided Liberman's research and theorizing on the process of speech perception.


Chapter 1.

The first chapter is a new article which reviews the historical progress of Liberman's research and the development of the Motor Theory of speech perception. This chapter is both introduction and conclusion to the compilation of articles in the remainder of the book. In it, Liberman sets forth quite clearly two different views of speech perception: the "horizontal" view and the "vertical" view. The horizontal view, which Liberman equates with an auditory view, is that speech perception engages no special mechanisms unique to speech. The vertical view, which he supports, is that there is a speech perception module which operates independently from the process of general auditory perception. He also associates the vertical view with the hypothesis that speech perception involves the hearer perceiving the articulatory gestures made by the speaker directly and not the acoustic result of those gestures. This chapter is extremely valuable both to the specialist and the novice, as a unifying work that brings together an entire research program in a single place.

Part 1. On the Spectrogram as a Visible Display of Speech

This section contains one article that is a brief description of an early attempt to convert between visual and auditory stimuli using the spectrograph and pattern playback on speech and simple geometric shapes. It reveals quite clearly Liberman's early horizontal view that there are modality independent properties of pattern perception that would apply equally well to speech as any other pattern, no matter how artificial.

Part 2. Finding the Cues

There are seven articles in this section, detailing experimental work on the auditory cues for phoneme identification in English. One of the major findings of these experiments is that there is no unitary acoustic invariant for a phoneme which corresponds to the unitary perceptual experience of the listener. In addition, there are discontinuities in the acoustic categories for different phonemes. These chapters are also quite useful in that they detail the basic acoustic properties of a variety of English phonemes. The topics covered include the consonant release burst of word initial stops as place cues, the direction and duration of CV formant transitions as cues for place and manner, the abrupt first formant onset as a cue for voiceless stops, and a summary article which contains an early description of the rules necessary to synthesize phonemically contrastive English.

Part 3. Categorical Perception

One of the articles in this section, "The Discrimination of Speech Sounds within and across Phoneme Boundaries", should be read by every student of speech. The authors compared the ability of subjects to label synthesized syllables (containing onsets on a acoustic continuum of stop place of articulation from "bay" to "day" to "gay") with there ability to discriminate these same syllables. They found acute discriminability between phoneme categories and poor discriminability within categories, a pattern which has come to be called "categorical perception". The analytical techniques and conclusions made in this article gave rise to what is now a large literature on the categorical perception of speech and non-speech by both humans and animals. The second article in this section details an attempt to compare speech with an equivalent non-speech control on the categorical perception of intervocalic stop closure duration which can be used to distinguish "rapid" from "rabid".

Part 4. An Early Attempt to Put It All Together

This section contains the article "Some Results of Research on Speech Perception" which presents what Liberman now calls the Early Motor Theory. In this model, the objects of speech perception are the articulations that create the acoustic patterns, which Liberman assumes make more coherent categories than the acoustic patterns.

Part 5. A Mid-Course Correction

The article in this section, "Perception of the Speech Code", is a review from ten years after the previous section. This article contains two major changes from the Early Motor Theory. The revised theory proposes the direct perception of articulatory events, without an intermediate auditory stage of processing. It also argues for a special speech mode in which this perception occurs, to account for differences between speech and non-speech perception. In particular, experiments on duplex perception show that dichotically presented parts of a syllable (e.g. an ambiguous "base syllable" and a crucial formant transition that differentiates [da] from [ga]) are unconsciously and uncontrollably fused into a complete percept. Also, conflicting auditory and visual information are integrated to produce a single perception (e.g. seeing a face produce [ba] while hearing [ga] results in a percept of [da]).

Part 6. The Revised Motor Theory

This section contains the 1985 article "The Motor Theory of Speech Perception Revised". This is another article which everyone with an interest in speech should read. This article is valuable in setting the Motor Theory apart from the more general theories of ecological psychology. This article also places the Motor Theory in the context of Fodor's writings on modularity. When compared to the previous two sections, it is most fascinating to see how the rest of the field had developed and "caught up", giving Liberman something more concrete to which to compare the Motor Theory. In this incarnation, the percepts of speech are the intended articulatory gestures of the speaker, which are perceived by a biologically specialized module.

Part 7. Some Properties of the Phonetic Module

The article in this section places the speech perception module of the Motor Theory in the context of other modules of perception and communication. Like other perceptual modules, the phonetic module preemptively processes stimuli, so that speech is not ordinarily perceived both as speech and as a collection of non-speech noises. Also, the objects of speech perception (the gestures) are radically different from the stimulus (the signal). This is much like the perception of three-dimensional depth, for example, from the integration of two two-dimensional retinal images.

Part 8. More about the Function and Properties of the Phonetic Module

The article in this section further discusses the modularity of speech perception, and also contributes to theories of modularity in general by proposing a difference between "open" and "closed" modules, and properties particular to each. Perception of linguistic contrasts utilizes a closed module with a discrete set of percepts. Perception of depth by stereoscopic vision utilizes an open module, that can perceive a continuous range of depth. The authors claim that when two modules compete for the same stimulus, processing by the closed module occurs before processing by the open module.

Part 9. Auditory vs. Phonetic Modes

This section contains seven articles which delve more deeply into the difference between speech and general auditory perception, from a variety of perspectives. The topics include the perception of linguistic categories from another language, trading relations between cues for a phonemic contrast, and acoustically appropriate non-speech controls.

Part 10. Reading/Writing Are Hard Just Because Speaking/Listening Are Easy

The final article in the book argues that the horizontal view predicts that reading and writing should be easier than speaking and listening. The Motor Theory and the vertical view predict that speech is primary due to the biological specialization of the speech perception module.

Critical evaluation:

There are three points highlighted in the more theoretical chapters of Liberman's book that I would like to address. First, he proposes there is a speech perception module, biologically specialized to process speech. Second, he proposes that the percepts of speech are not auditory, but rather that they are articulatory. Third, he argues that were these not the case, we would expect reading and writing to be easier than speaking and listening, when in fact, speaking and listening are easier. I consider each point in turn.

A variety of experiments showing that speech is processed differently from non-speech provide evidence for a specialized speech perception module. However, it is uncertain whether these experiments consider appropriate non-speech controls to compare to speech. While a number of ways of creating complex signals which are more or less acoustically equivalent to speech are considered, these experiments do not explore whether there are controls which are communicatively or informationally equivalent to speech. Fowler & Rosenblum (1990) found that a natural sound, the sound of a door slamming, patterned more like speech, and differently from laboratory generated non-speech controls (which are artificial sound patterns). A door slam is ecologically relevant, as it gives the hearer information about an action which occurred in the world. Speech has tremendous social significance and is probably the most highly practiced complex perceptual task performed by humans. These factors have not been adequately considered when explaining differences between speech and non-speech perception. While it may be the case that speech is processed by a special mechanism, we cannot exclude the possibility that this mechanism also processes some types of non-speech sounds.

A second claim of the Motor Theory of speech perception is that the percepts of speech are not the acoustic signals which impinge directly upon the ear, but rather that the percepts are the distal articulations made by the speaker. One of the Liberman's first findings was that there is no acoustic invariant which corresponds to the perceptual invariant of the phoneme or segment. It is now well known, and admitted by Liberman, that the articulatory gestures and even their motor commands are not invariant either. In the revised theory, the articulatory percepts are assumed to be the speaker's intended gestures, before contextual adjustments. However, abstracting the percept to this degree undermines the claim that the percepts are articulatory. The percepts might as well be entirely abstract phonemic categories.

Another more striking finding from Liberman's early experiments is that there are discontinuities in the acoustic to phonemic mapping for onset consonants. These discontinuities were taken as additional evidence against an acoustic basis for phoneme categories. Other researchers have found that for some phonemic categories the acoustic mapping is simple while the articulatory mapping is complex. For example, American English /r/ can be produced with one or more of three distinct gestures, and there is intraspeaker variation in which gestures are used (Delattre & Freeman 1968; Hagiwara 1995; see also Johnson, Ladefoged, & Lindau 1993). With neither acoustic nor articulatory categories providing simple dimensions upon which to base the perceptual category, once again we are led to more abstract invariant percepts. The coherence as categories of these abstractions can be based on either articulatory or acoustic properties, or both. This conclusion accords well with linguistic theory, where abstract segments or phonemes are generally accepted in some form, and where phonological processes exist which need to be described both by articulatory features (Chomsky & Halle 1968, Clements 1985) and by acoustic features (Jakobson, Fant, & Halle 1965; Flemming 1995).

Finally, Liberman claims that, since the perceptive and productive mechanisms for reading and writing, the eyes and hand, are much more sensitive and agile than those for speech, reading and writing should be simpler than speech. He rightly points out that reading and writing must be taught, and are learned only with difficulty by many, which suggests that there is something special about speech. Indeed, speech is special, but it has cognitive and evolutionary advantages over reading and writing which more than offset the other advantages of reading and writing. For example, the unfolding of a linguistic message in speech is naturally determined by the flow of time, whereas writing is arbitrarily directional so the direction of reading can be determined only by convention. Reading and writing also require the use of an additional medium, such as paper or a patch of dirt, and so from an evolutionary point of view reading and writing is at a disadvantage. Rather than reading and writing, we should consider sign language when looking for a visual equivalent to speech. In fact, sign is learned by deaf children of signing parents just as easily and automatically as speech is learned by hearing children of speaking parents, and some researchers believe sign language does have an acquisition advantage (see Newport & Meier 1985, Meier & Newport 1990, Volterra & Iverson 1995 for discussion). Sign languages are the equals of oral languages in linguistic complexity and arbitrariness, and their existence shows that much of what is special about speech does not depend specifically on the ear and vocal tract.

In summary, Liberman's articles provide strong evidence that speech is special, and processed differently and preemptively by a mechanism that has many of the properties of a modular system. However, much of what is special in speech is also found in sign language and in other ecologically relevant sounds. Arguments for a biological specialization for speech perception as articulatory are based on an overly restricted range of evidence. In the broader perspective, speech is special because it is an integral part of natural language. This book is an informative and provocative study of that very important facet of language.

References: Chomsky, N. & M. Halle (1968). The sound pattern of English. Cambridge, MA: MIT Press.

Clements, N. (1985). The geometry of phonological features. Phonology Yearbook 2: 225-252.

Delattre, P. & D. Freeman (1968). A dialect study of American r's by X-ray motion picture. Linguistics 44: 29-68.

Flemming, E. (1995). Auditory representations in phonology. Unpublished Ph.D. Thesis, UCLA.

Hagiwara, R. (1995). Acoustic realizations of American /r/ as produced by women and men. Ph.D. Thesis, UCLA, published as UCLA Working Papers in Phonetics 90.

Jakobson, R., G. Fant, & M. Halle (1952). Preliminaries to speech analysis. Cambridge, MA: MIT Press.

Johnson, K., P. Ladefoged, & M. Lindau (1993). Individual differences in vowel production. Journal of the Acoustical Society of America 94(2): 701-714.

Meier, R. & E. Newport (1990). Out of the hands of babes: On a possible sign advantage. Language 66(1): 1-23.

Newport, E. & R. Meier (1985). The acquisition of American Sign Language. In D. Slobin (ed.), The crosslinguistic study of language acquisition, volume 1: The data. Hillsdale, NJ: Lawrence Earlbaum. 881-938.

Volterra, V. & J. Iverson (1995). When do modality factors affect the course of language acquisition?. In K. Emmorey & J. Reilly (eds.), Language, gesture, and space. Hillsdale, NJ: Lawrence Earlbaum. 371-390.

Reviewer: Stefan Frisch, NIH Post-Doctoral Research Fellow, Speech Research Laboratory, Indiana University. Ph.D. in Linguistics. Research interests include phonetics, phonology, and psycholinguistics (a.k.a. laboratory phonology) and the language/cognition interface.

Acknowledgment: Thanks to David Pisoni, Sonya Sheffert, and Richard Wright for comments and discussion of this work.

Reviewer's address: Stefan Frisch Speech Research Laboratory Psychology Department Indiana University Bloomington, IN 47405