LINGUIST List 8.954

Sat Jun 28 1997

Review: Liberman: Speech- A special Code

Editor for this issue: Andrew Carnie <carnielinguistlist.org>


What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Andrew Carnie at carnielinguistlist.org

Message 1: Review of Liberman, Speech: A special code

Date: Fri, 27 Jun 1997 09:52:26 -0500 (EST)
From: Stefan A Frisch <safrischindiana.edu>
Subject: Review of Liberman, Speech: A special code




Liberman, Alvin M. (1996). Speech: A special code. Learning, Development, 
and Conceptual Change series. Cambridge, MA: MIT Press. 458 pages. ISBN 
0-262-12192-1.

Reviewed by Stefan A Frisch <safrischindiana.edu>


This book is a collection of articles representing 50 years of speech 
research at Haskins Laboratories undertaken by Alvin Liberman and 
colleagues. It is divided into ten sections, which cover a range of 
issues which have guided Liberman's research and theorizing on the 
process of speech perception.


Synopsis:

Chapter 1.

The first chapter is a new article which reviews the historical progress 
of Liberman's research and the development of the Motor Theory of speech 
perception. This chapter is both introduction and conclusion to the 
compilation of articles in the remainder of the book. In it, Liberman 
sets forth quite clearly two different views of speech perception: the 
"horizontal" view and the "vertical" view. The horizontal view, which 
Liberman equates with an auditory view, is that speech perception engages 
no special mechanisms unique to speech. The vertical view, which he 
supports, is that there is a speech perception module which operates 
independently from the process of general auditory perception. He also 
associates the vertical view with the hypothesis that speech perception 
involves the hearer perceiving the articulatory gestures made by the 
speaker directly and not the acoustic result of those gestures. This 
chapter is extremely valuable both to the specialist and the novice, as a 
unifying work that brings together an entire research program in a single 
place.

Part 1. On the Spectrogram as a Visible Display of Speech

This section contains one article that is a brief description of an early 
attempt to convert between visual and auditory stimuli using the 
spectrograph and pattern playback on speech and simple geometric shapes. 
It reveals quite clearly Liberman's early horizontal view that there are 
modality independent properties of pattern perception that would apply 
equally well to speech as any other pattern, no matter how artificial.

Part 2. Finding the Cues

There are seven articles in this section, detailing experimental work on 
the auditory cues for phoneme identification in English. One of the 
major findings of these experiments is that there is no unitary acoustic 
invariant for a phoneme which corresponds to the unitary perceptual 
experience of the listener. In addition, there are discontinuities in 
the acoustic categories for different phonemes. These chapters are also 
quite useful in that they detail the basic acoustic properties of a 
variety of English phonemes. The topics covered include the consonant 
release burst of word initial stops as place cues, the direction and 
duration of CV formant transitions as cues for place and manner, the 
abrupt first formant onset as a cue for voiceless stops, and a summary 
article which contains an early description of the rules necessary to 
synthesize phonemically contrastive English.

Part 3. Categorical Perception

One of the articles in this section, "The Discrimination of Speech Sounds 
within and across Phoneme Boundaries", should be read by every student of 
speech. The authors compared the ability of subjects to label 
synthesized syllables (containing onsets on a acoustic continuum of stop 
place of articulation from "bay" to "day" to "gay") with there ability to 
discriminate these same syllables. They found acute discriminability 
between phoneme categories and poor discriminability within categories, a 
pattern which has come to be called "categorical perception". The 
analytical techniques and conclusions made in this article gave rise to 
what is now a large literature on the categorical perception of speech 
and non-speech by both humans and animals. The second article in this 
section details an attempt to compare speech with an equivalent 
non-speech control on the categorical perception of intervocalic stop 
closure duration which can be used to distinguish "rapid" from "rabid".

Part 4. An Early Attempt to Put It All Together

This section contains the article "Some Results of Research on Speech 
Perception" which presents what Liberman now calls the Early Motor 
Theory. In this model, the objects of speech perception are the 
articulations that create the acoustic patterns, which Liberman assumes 
make more coherent categories than the acoustic patterns.

Part 5. A Mid-Course Correction

The article in this section, "Perception of the Speech Code", is a review 
from ten years after the previous section. This article contains two 
major changes from the Early Motor Theory. The revised theory proposes 
the direct perception of articulatory events, without an intermediate 
auditory stage of processing. It also argues for a special speech mode 
in which this perception occurs, to account for differences between 
speech and non-speech perception. In particular, experiments on duplex 
perception show that dichotically presented parts of a syllable (e.g. an 
ambiguous "base syllable" and a crucial formant transition that 
differentiates [da] from [ga]) are unconsciously and uncontrollably fused 
into a complete percept. Also, conflicting auditory and visual 
information are integrated to produce a single perception (e.g. seeing a 
face produce [ba] while hearing [ga] results in a percept of [da]).

Part 6. The Revised Motor Theory

This section contains the 1985 article "The Motor Theory of Speech 
Perception Revised". This is another article which everyone with an 
interest in speech should read. This article is valuable in setting the 
Motor Theory apart from the more general theories of ecological 
psychology. This article also places the Motor Theory in the context of 
Fodor's writings on modularity. When compared to the previous two 
sections, it is most fascinating to see how the rest of the field had 
developed and "caught up", giving Liberman something more concrete to 
which to compare the Motor Theory. In this incarnation, the percepts of 
speech are the intended articulatory gestures of the speaker, which are 
perceived by a biologically specialized module.

Part 7. Some Properties of the Phonetic Module

The article in this section places the speech perception module of the 
Motor Theory in the context of other modules of perception and 
communication. Like other perceptual modules, the phonetic module 
preemptively processes stimuli, so that speech is not ordinarily 
perceived both as speech and as a collection of non-speech noises. Also, 
the objects of speech perception (the gestures) are radically different 
from the stimulus (the signal). This is much like the perception of 
three-dimensional depth, for example, from the integration of two 
two-dimensional retinal images.

Part 8. More about the Function and Properties of the Phonetic Module

The article in this section further discusses the modularity of speech 
perception, and also contributes to theories of modularity in general by 
proposing a difference between "open" and "closed" modules, and 
properties particular to each. Perception of linguistic contrasts 
utilizes a closed module with a discrete set of percepts. Perception of 
depth by stereoscopic vision utilizes an open module, that can perceive a 
continuous range of depth. The authors claim that when two modules 
compete for the same stimulus, processing by the closed module occurs 
before processing by the open module.

Part 9. Auditory vs. Phonetic Modes

This section contains seven articles which delve more deeply into the 
difference between speech and general auditory perception, from a variety 
of perspectives. The topics include the perception of linguistic 
categories from another language, trading relations between cues for a 
phonemic contrast, and acoustically appropriate non-speech controls.

Part 10. Reading/Writing Are Hard Just Because Speaking/Listening Are Easy

The final article in the book argues that the horizontal view predicts 
that reading and writing should be easier than speaking and listening. 
The Motor Theory and the vertical view predict that speech is primary due 
to the biological specialization of the speech perception module.


Critical evaluation:

There are three points highlighted in the more theoretical chapters of 
Liberman's book that I would like to address. First, he proposes there 
is a speech perception module, biologically specialized to process 
speech. Second, he proposes that the percepts of speech are not 
auditory, but rather that they are articulatory. Third, he argues that 
were these not the case, we would expect reading and writing to be easier 
than speaking and listening, when in fact, speaking and listening are 
easier. I consider each point in turn.

A variety of experiments showing that speech is processed differently 
from non-speech provide evidence for a specialized speech perception 
module. However, it is uncertain whether these experiments consider 
appropriate non-speech controls to compare to speech. While a number of 
ways of creating complex signals which are more or less acoustically 
equivalent to speech are considered, these experiments do not explore 
whether there are controls which are communicatively or informationally 
equivalent to speech. Fowler & Rosenblum (1990) found that a natural 
sound, the sound of a door slamming, patterned more like speech, and 
differently from laboratory generated non-speech controls (which are 
artificial sound patterns). A door slam is ecologically relevant, as it 
gives the hearer information about an action which occurred in the 
world. Speech has tremendous social significance and is probably the 
most highly practiced complex perceptual task performed by humans. These 
factors have not been adequately considered when explaining differences 
between speech and non-speech perception. While it may be the case that 
speech is processed by a special mechanism, we cannot exclude the 
possibility that this mechanism also processes some types of non-speech 
sounds.

A second claim of the Motor Theory of speech perception is that the 
percepts of speech are not the acoustic signals which impinge directly 
upon the ear, but rather that the percepts are the distal articulations 
made by the speaker. One of the Liberman's first findings was that there 
is no acoustic invariant which corresponds to the perceptual invariant of 
the phoneme or segment. It is now well known, and admitted by Liberman, 
that the articulatory gestures and even their motor commands are not 
invariant either. In the revised theory, the articulatory percepts are 
assumed to be the speaker's intended gestures, before contextual 
adjustments. However, abstracting the percept to this degree undermines 
the claim that the percepts are articulatory. The percepts might as well 
be entirely abstract phonemic categories. 

Another more striking finding from Liberman's early experiments is that 
there are discontinuities in the acoustic to phonemic mapping for onset 
consonants. These discontinuities were taken as additional evidence 
against an acoustic basis for phoneme categories. Other researchers have 
found that for some phonemic categories the acoustic mapping is simple 
while the articulatory mapping is complex. For example, American English 
/r/ can be produced with one or more of three distinct gestures, and 
there is intraspeaker variation in which gestures are used (Delattre & 
Freeman 1968; Hagiwara 1995; see also Johnson, Ladefoged, & Lindau 
1993). With neither acoustic nor articulatory categories providing 
simple dimensions upon which to base the perceptual category, once again 
we are led to more abstract invariant percepts. The coherence as 
categories of these abstractions can be based on either articulatory or 
acoustic properties, or both. This conclusion accords well with 
linguistic theory, where abstract segments or phonemes are generally 
accepted in some form, and where phonological processes exist which need 
to be described both by articulatory features (Chomsky & Halle 1968, 
Clements 1985) and by acoustic features (Jakobson, Fant, & Halle 1965; 
Flemming 1995).

Finally, Liberman claims that, since the perceptive and productive 
mechanisms for reading and writing, the eyes and hand, are much more 
sensitive and agile than those for speech, reading and writing should be 
simpler than speech. He rightly points out that reading and writing must 
be taught, and are learned only with difficulty by many, which suggests 
that there is something special about speech. Indeed, speech is special, 
but it has cognitive and evolutionary advantages over reading and writing 
which more than offset the other advantages of reading and writing. For 
example, the unfolding of a linguistic message in speech is naturally 
determined by the flow of time, whereas writing is arbitrarily 
directional so the direction of reading can be determined only by 
convention. Reading and writing also require the use of an additional 
medium, such as paper or a patch of dirt, and so from an evolutionary 
point of view reading and writing is at a disadvantage. Rather than 
reading and writing, we should consider sign language when looking for a 
visual equivalent to speech. In fact, sign is learned by deaf children 
of signing parents just as easily and automatically as speech is learned 
by hearing children of speaking parents, and some researchers believe 
sign language does have an acquisition advantage (see Newport & Meier 
1985, Meier & Newport 1990, Volterra & Iverson 1995 for discussion). 
Sign languages are the equals of oral languages in linguistic complexity 
and arbitrariness, and their existence shows that much of what is special 
about speech does not depend specifically on the ear and vocal tract.

In summary, Liberman's articles provide strong evidence that speech is 
special, and processed differently and preemptively by a mechanism that 
has many of the properties of a modular system. However, much of what is 
special in speech is also found in sign language and in other 
ecologically relevant sounds. Arguments for a biological specialization 
for speech perception as articulatory are based on an overly restricted 
range of evidence. In the broader perspective, speech is special because 
it is an integral part of natural language. This book is an informative 
and provocative study of that very important facet of language.


References:
Chomsky, N. & M. Halle (1968). The sound pattern of English. Cambridge, 
MA: MIT Press.

Clements, N. (1985). The geometry of phonological features. Phonology 
Yearbook 2: 225-252.

Delattre, P. & D. Freeman (1968). A dialect study of American r's by 
X-ray motion picture. Linguistics 44: 29-68.

Flemming, E. (1995). Auditory representations in phonology. Unpublished 
Ph.D. Thesis, UCLA.

Hagiwara, R. (1995). Acoustic realizations of American /r/ as produced by 
women and men. Ph.D. Thesis, UCLA, published as UCLA Working Papers in 
Phonetics 90.

Jakobson, R., G. Fant, & M. Halle (1952). Preliminaries to speech 
analysis. Cambridge, MA: MIT Press.

Johnson, K., P. Ladefoged, & M. Lindau (1993). Individual differences in 
vowel production. Journal of the Acoustical Society of America 94(2): 
701-714.

Meier, R. & E. Newport (1990). Out of the hands of babes: On a possible 
sign advantage. Language 66(1): 1-23.

Newport, E. & R. Meier (1985). The acquisition of American Sign Language. 
In D. Slobin (ed.), The crosslinguistic study of language acquisition, 
volume 1: The data. Hillsdale, NJ: Lawrence Earlbaum. 881-938.

Volterra, V. & J. Iverson (1995). When do modality factors affect the 
course of language acquisition?. In K. Emmorey & J. Reilly (eds.), 
Language, gesture, and space. Hillsdale, NJ: Lawrence Earlbaum. 371-390.


Reviewer:
Stefan Frisch, NIH Post-Doctoral Research Fellow, Speech Research 
Laboratory, Indiana University. Ph.D. in Linguistics. Research interests 
include phonetics, phonology, and psycholinguistics (a.k.a. laboratory 
phonology) and the language/cognition interface.


Acknowledgment:
Thanks to David Pisoni, Sonya Sheffert, and Richard Wright for comments 
and discussion of this work.


Reviewer's address:
Stefan Frisch
Speech Research Laboratory
Psychology Department
Indiana University
Bloomington, IN 47405
safrischindiana.edu
http://www.indiana.edu/~srlweb/staff/frisch.html

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue