LINGUIST List 11.1317

Wed Jun 14 2000

Review: Oller: The Emergence of the speech capacity

Editor for this issue: Andrew Carnie <>

What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Andrew Carnie at


  1. Madalena Cruz-Ferreira, Book review

Message 1: Book review

Date: Thu, 1 Jun 2000 13:53:08 +0800
From: Madalena Cruz-Ferreira <>
Subject: Book review

Oller, D. Kimbrough (2000), The emergence of the speech capacity,
Lawrence Erlbaum Associates, Mahwah, NJ.
xvii + 428 pages, paperback edition US$39.95.

Reviewed by Madalena Cruz-Ferreira, National University of Singapore.


 Kimbrough Oller's book presents a model accounting for the emergence
of the speech capacity in human infants and in the human species. The model
consists of a hierarchical system of properties that characterise potential
communication systems at a deep level of abstraction, the infrastructural

 Chapter 1 introduces the theoretical framework, as well as the key
concerns and goals of the book, with preliminary discussion. The framework
of infraphonology, expanded from Oller's earlier research, involves the
assessment of "the infrastructure of human speech sounds" (p.10), specifying
the properties accounting for well-formed elements of phonology. The
syllable is taken as the crucial unit in the model, and is defined as the
minimal rhythmic unit of natural phonological systems.

 Chapter 2 gives the historical background to the model adopted in the
book, arguing that infant vocal development cannot be understood from the
perspective of a mature linguistic system. Oller views the shoe-horning (his
term) of precanonical infant vocalisations into the "operational-level
categories" (p.29) of mature speech as deeply misleading and as the prime
factor behind much confusion in the study of speech development. Crucially,
transcription systems devised for adult systems are inadequate in grasping
stages of development, in that phonetic transcription presumes
well-formedness in early infant vocalisations where, Oller argues, none is
to be found.
 The chapter reviews Jakobson's, Lenneberg's and Irwin's approaches to
speech development, pointing out what stands out as erroneous claims due to
inappropriate methodology and research tools.

 Chapter 3 argues for an infrastructural approach to infant speech,
through which the stages of vocal development in the first few months of
life become apparent, in that each stage is typified by the production of
particular protophones. These are "untranscribable categories" (p.50) that
constitute the key bridge to full-blown speech, and include "all the
utterance types that appear to be precursors to speech" (p.10).

 Chapter 4 lays out the basic tenets of infraphonology. Four basic stages
in vocal development are set up, each defined by typical protophone
production and each evidencing progressive infraphonological achievement, as
 In Stage 1, the Phonation stage (0-2 months), the infant produces
quasivowels, achieving normal phonation. Quasivowels are sounds that "lack
full vocalic status" because they are produced "with the vocal tract at
rest" (p.63).
 In Stage 2, Primitive Articulation (1-4 months), the infant produces
gooing, achieving limited articulation.
 In Stage 3, Expansion (3-8 months), the infant produces full vowels as
well as marginal babbling, achieving full resonance/articulation.
 In Stage 4, the Canonical stage (5-10 months), the infant produces
canonical babbling, achieving well-timed articulation.
 Precanonical and canonical babbling constitute two distinct stages, the
onset of the latter established when infants "produce closure and opening
sequences with normal phonation in well-timed, often repetitive patterns"
(p.65), including reduplicated sequences. Canonical syllable formation obeys
four basic principles:
 Normal phonation: smooth voicing
 Articulation: movement of the vocal tract during voicing
 Full resonance: opening and posturing of the tract during vowel-like
 Rapid transitions: well-timed movements from closed to open postures.

 Chapter 5 starts by assessing the heuristic power of infrastructural
models in other research domains, namely, chemistry and biology. Oller
argues that infrastructural models may well be universal and, as "conditions
of the universe", they are of course "independent of humanity" (p. 105).
Vocal development, organised by infraphonological theory, follows a natural
logic on the path leading from primitive capabilities to the structured
patterns present in mature speech.
 There follows a characterisation of speech sounds, from the range of all
human articulatory possibilities, in order to establish the conditions that
define "possible speech event types" (p.86). Instrumental analysis is called
for in this quest, but Oller rightly emphasises the crucial role played by
human listeners, the "real perceivers" (p.89), in their judgement of
relevant speech principles and categories, by which instrumental
quantification should be guided. A universal characterisation of the
syllable is proposed, with the proviso that infraphonology must ultimately
"specify the contrastive sounds at each of the tiers of human phonological
function" (p.81), like, for example, phonological features, segments, feet,
phrases and higher order rhythmic units.
 The chapter closes with the statement of infraphonology as a part of
Universal Grammar, independent from emergentist or nativist claims on
grammatical principles.

 Chapter 6 discusses vocal and gestural development within the general
context of motor development, highlighting the role of motor practice in
vocal development. The concept of canalisation is analysed, concerning
developmental events that are difficult to "deflect from a biologically
preordained course" (p.113). Vocal and signed babbling constitute examples
of canalised behaviour in hearing and deaf infants, respectively, as does
for example, hand banging and motor engagement in other rhythmical patterns
in all infants. The role of auditory experience appears as non-negligible
for the emergence of vocal babbling, as differences in its timing onset are
discernible between hearing and deaf infants.

 Chapter 7 investigates key stages in vocal development, in particular
the babbling stage, in the light of several studies targeting infants in
conditions of socio-economic deprivation, premature birth or bilingualism,
or a combination of these. Results confirm the robustness of these stages,
suggesting that the logical growth of the infraphonological capabilities is
immutably embedded in our biological makeup.

 In the light of this evidence, Chapter 8 examines the limits on the
disruption of the canalised pattern of babbling, from case studies on
profound deafness and mental retardation. These studies show that protophone
development may be delayed or otherwise disrupted, but not prevented, and
that disruption in the timing onset of canonical babbling occurs in infants
who later show deviant development of language-related abilities. This clear
correlation encourages the generalisation of diagnostic procedures whereby
10-month old infants may be screened for hearing impairment, or other
impairment, on the basis of an evaluation of canonical babbling onset.

 Chapter 9 turns to the social function of infant vocal communication,
ranging from vegetative sounds to protophones. It describes the ways in
which infants seek mastery of the powerful human vocal tool through
exploration of its flexibility along several dimensions that infants isolate
for the purpose of practice. For example, the alternations of whispers and
yells explore the parameter of intensity. Body posture and facial
expressions, of the child as well as of interfacing adults, gradually
acquire social significance too.

 Chapter 10 sets out to investigate the specificity of protophones, as
opposed to fixed signals, like crying and laughter, and vegetative
vocalisations, drawing comparisons between humans and other mammals.
 Vegetative sounds, resulting from bodily functions associated with
respiration, swallowing an digestion, may nevertheless have social
significance in many species: for example, sneezing may indicate the
presence of a potentially dangerous substance, and the species is thus able
to interpret sounds or gestures that are not specialised for communication.
Fixed signals, with origins in bodily responses unrelated to communication,
may have been moulded to the purposes of communication through natural
selection. Their form is quite stable: developmental changes in the form of
human fixed signals are notable with, e.g., age, but do not prevent the
recognition of a cry or a laugh at any age. Vegetative sounds are relatively
identifiable across species, whereas fixed vocal signals are more
 Protophones, in contrast, have no biologically specified values as
signals, nor are they associated with specific emotional states. Protophone
play needs no intention to communicate either. These sounds do not have to
be produced, nor do they have to be produced within a narrow range of
acoustic parameters. They have the potential to be moulded to situations,
thereby showing a primary feature of Conventionality (discussed in Chapter
12), typical of language in its embryonic state. Often, the first
associations between protophones and intention tend to be made up by the
child, i.e., with no intervention from adult models. Protophones, in this
way, "cut a path on the way to a lexicon" and emerge as the "specific
precursors to speech" (p.193). They can be produced independently of any
signalling value and have therefore the potential for association with new

 Chapter 11 discusses the interest in the origins of human language
throughout the ages, to strike a position of compromise: the differences
between human and nonhuman communication systems may provide insight about
our origins, whether the differences lie in our mental or our vocal attire.
 Oller acknowledges his debt to17th and 18th century thought in laying
the foundations for the systematic development of a general theory of
properties attempted in this book. In the same sense that infant speech
cannot be assessed through mature language models, it is "in the context of
a species-independent theory of infraphonological properties [that] it is
possible to make insightful comparisons among different species" (p. 209).
>From these comparisons, the special features of human action and capability,
that are present from the first months of life, appear as clearly unique,
presaging the elaborate ability to communicate through speech.

 Chapter 12 specifies the dimensions along which communication systems
may vary, in the form of 18 properties that highlight the many routes
through which species differentiation may take place. The properties, simply
listed in this review along with the briefest explanation for each, are:
 1. Contextual Freedom, the intentional control over vocalisations.
 2. Free Expressivity, the ability to express oneself through
vocalisations of any sort.
 3. Directivity, the display of vocalisations for the production of
social effects.
 4. Interactivity, social connection through vocal turn-taking.
 5. Imitability, the adaptation to community-determined signals.
 6. Designation, the ability to share reference to entities.
 7. Conventionality, the assignment of values of any kind to signals.
 8. Arbitrarity, where there is no discernible similarity of signal and
 9. Semanticity, analytical reference to a class of entities.
 10. Displaceability, reference to absent entities.
 11. Propositionality, the use of multiword utterances.
 12. Signal Analysis, the manipulation of acoustic parameters of speech.
 13. Categorical Adaptation, the ability to decompose syllables into
their feature components.
 14. Syllabification, the use of well-formed syllables.
 15. Recombinability, the combination of syllables into novel patterns.
 16. Rhythmic Hierarchy, the organisation of syllables within breath
 17. Segmentation, the use of segmental recombination.
 18. Hot-Cool Synthesis, the production of graded vs. discrete meanings.
 Each property is analysed in turn, and their availability in both human
and other primates is gauged through an authoritative review of research on
nonhuman (young) primates. The discussion also points to where parallels may
obtain between ontogenetic and phylogenetic language development, on the
basis of the implicational hierarchy naturally evident in these properties.

 Chapter 13 offers a speculative scenario of the phylogeny of language,
one of many possibilities, as Oller points out, that may be consistent with
the facts of vocal development as embodied in the properties hierarchy. The
scenario is presented in 7 successive scenes corresponding to as many
evolutional stages, in accordance with Oller's view that language may have
evolved gradually. These scenes involve "features that appear consistent
with other aspects of empirical paleontology" (p.337), and encompass
developmental stages parallel to those found in human infants up to the 2-
and 3-word stage.
 Through this scenario, Oller distances himself from the view that
speech ontogeny recapitulates phylogeny by virtue of genetic programming,
underscoring instead the claim that "to the extent there is similarity of
ontogeny and phylogeny in speech, it may be the result of the fact that both
modern infants and ancient hominids have been subject to similar constraints
of a natural, logical, infrastructural sort. There are inherently natural
ways that a vocal communication system might develop."(p.317).

 Chapter 14 returns to the intrinsic difference between speech and fixed
signals, to give a comparison of fixed signals across human and nonhuman
system. Guided by infrastructural modelling, the comparison specifically
purports to highlight substantial similarities among these. Universal
properties appear to underlie animal communication capabilities in general.

 Finally, Chapter 15 summarises the book and recapitulates the main
points. By leaving operational-level units of mature speech out of
descriptions of infant vocalisations, "we found stages of development that
had eluded prior generations of scholars" (p.356), which in turn opened the
ground for a universal comparison of communicative systems . Furthermore,
"the key to sensible application of developmental research in evolution is
the use of infrastructural modeling [where] both ontogeny and phylogeny are
presumed to be governed by a common set of properties and principles of
potential infrastructure." (p.364).

Critical Evaluation

 Oller's infraphonological model provides ground-breaking insight into
language ontogeny and phylogeny, and stands out as intuitively satisfying.
The goal of the model is best expressed in the author's own words: "If
properly formulated, these properties may constitute a stable set of logical
possibilities from which systems of communication choose, and may constitute
a species-independent, universal system of limits on the ways that vocal
communication can be developed." (p.210).
 It makes good sense to treat communication systems from a common
underlying perspective, avoiding the easy trap of analysing one system
through the communicative lenses of another. Oller perceptively directs his
analysis to the very beginning of the potential differentiation of the human
and nonhuman lines, far before the time frames adopted in other literature
on the evolution of language, and takes therefore nothing for granted. In
his own assessment, rudimentary as some properties may seem, compared to
full-blown human language and to corresponding current accounts of it
particularly in the area of syntactic theory, Oller's contention is that
such properties represent "critical departures from the vocal capabilities
of other primates" (p.319). The universality of their application is
highlighted throughout the book in several references to diverse animal
communication systems, including mammals, cetaceans and birds. Mutatis
mutandis, the reader is left with few doubts about the effective power of
this model to identify, for example, so-called 'intelligent' electronic
communication devices.
 Oller's claims emerge as all the more convincing through the way in
which he gradually presents his model, indicating its potential ambiguities
and pointing out issues to which the present state of our knowledge fails to
provide an answer. This not only results in a lucid picture of what a
scientific proposal should look like, it entices the reader to look for
answers to the issues left open.
 His claims are also based on an impressive amount of empirical research.
This is true of the speculative scenario presented in chapter 13 too, whose
solid grounding in paleontological findings makes for fascinating reading
about what could well have been our infra-articulate past. In reviews of
literature, Oller's stance is equally cautious, making plentiful use of
footnotes, excursions and forward references within the book itself, that
tone down any sweeping, reductionist generalisations that the reader might
be tempted to indulge in.

 In view of the depth and breadth of the issues covered in this book, a
few words of disappointment are due on Oller's treatment of intonation.
Although intonation is mentioned several times throughout the book, the
proposed model offers no linguistically pertinent place for intonation
either in the protophone inventory or the infrastructural properties.
Intonation appears as an incidental addition to the fundamental protophone
system, mostly to make it "more powerful" (p.224) through the
superimposition of illocutionary force or the expression of emotion. This is
particularly clear, for example, in the discussion of the Hot-Cool Synthesis
property in chapter 12 and passim, where the gradient, emotional, hot, side
of the dichotomy is manifested by uses of pitch, whereas the discrete,
informational, cool, side is coded in syllables. The remarks that this
synthesis is a "magical" property of human language (p.304) and "a means of
coding a fundamental distinction between meaning and other communicative
values" (p.306) seems to vouch for the stance that no linguistic meaning is
coded by uses of pitch. The picture that emerges is that intonation, being
the domain of emotions, is in fact equivalent to one of Oller's fixed
signals, with predictable cross-language and cross-species meanings, which
is synonymous with lack of linguistic meaning. The statement that "Human
infants begin to use arbitrary vocal signals by 12 months of age, with the
first words of the emergent lexicon" (p. 275) reinforces the view that
intonation carries no arbitrary meaning.
 Protophones and words are taken as the building blocks of emergent
speech and therefore, of full-blown language. It is not clear how the rich
intonational system of a language comes to be acquired later than, or
independently from, the segmental units of the same language. It is even
less clear, indeed, how these units can be articulated without intoning,
although Oller does sporadically refer to the imitation of intonation as
occurring earlier than the imitation of syllables (pp.31, 179) and to richly
intoned babbling (p.184, 305). The problems raised by this view become
clear, for example, in the discussion of a putative example (pp.277-278) of
a child using the syllable [ba] in many ways, including pointing at a ball
and saying "[ba] with rising intonation, waiting for the adult to confirm
that indeed the object is called ball". Oller concludes that "pairing a
single meaning and sound with multiple forces", intonation apparently being
a "force", provides evidence that "the word [ba] has achieved semantic
status", that is, [ba] refers "analytically to the class of objects, balls".
My point is, the same child could as well have been using the same rising
intonation with several other syllables of its repertoire, in order to fix
the semantic status of the tone. The infrastructural model seems to me to
allow for an account of intonational protophones too.
 As it stands, the treatment of intonation in this book is, at best,
ambiguous. At worst, it may foster the ingrained and erroneous popular
belief that intonation is extralinguistic and plays a marginal role in vocal
communication. Pitch modulation can convey emotional states, but so can the
use of specific lexical units, in all languages. Depriving a universal model
of communication systems of its intonational component unduly impoverishes,
in my view, our understanding of the emergence of the speech capacity.

[About the reviewer: Madalena Cruz-Ferreira teaches phonetics, phonology,
morphology and general linguistics at the National University of Singapore.
Her research interests include prosody, bilingual child language acquisition
and Portuguese linguistics. ]

Madalena Cruz-Ferreira

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue