LINGUIST List 13.541

Tue Feb 26 2002

Review: de Boer, The Origins of Vowel Systems

Editor for this issue: Terence Langendoen <>

What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at or Terry Langendoen at Subscribe to Blackwell's LL+ at and donate 20% of your subscription to LINGUIST! You get 30% off on Blackwells books, and free shipping and postage!


  1. Jansen W., Review of de Boer (2001) The Origins of Vowel Systems

Message 1: Review of de Boer (2001) The Origins of Vowel Systems

Date: Tue, 26 Feb 2002 16:59:56 +0100 (MET)
From: Jansen W. <>
Subject: Review of de Boer (2001) The Origins of Vowel Systems

De Boer, Bart (2001) The Origins of Vowel Systems. Oxford University
Press, xii+168pp, hardback ISBN 0-19-829965-6, Studies in the Evolution
of Language 1.

Wouter Jansen -- Department of Linguistics, University of Groningen, and
Department of Linguistics, University of Tuebingen

Liljencrants and Lindblom's (1972) Dispersion Theory claims that a number
of typological trends in the phonetic composition of vowel inventories can
be derived from the assumption that the phonetic implementation of vowel
categories is maximally dispersed in the available auditory space. For
instance, if a language has three contrastive vowels they are highly
likely to be implemented as [i], [a], and [u], which represent the
extremities of the F1-F2 triangle (roughly corresponding to height and
backness in articulatory terms) in which vowels can be realised. The
underlying idea is that dispersion leads to less confusion of contrastive
vowels and thereby to more effective communication.

The main claim of the book under review is that, on a number of
assumptions, vowel dispersion can emerge in a language without the need
for the assessment by individual speakers of the global properties of
their inventories. The computational model on which this claim is based
derives a number of additional predictions with varying degrees of
accuracy about the typology of vowel inventories.

The book is based on the author's (1999) computer science PhD thesis but
has been adapted to suit an interdisciplinary readership. It consists of
three main parts. Chapters 1, 2 and 3 set the scene by outlining the
conceptual basis of the approach (self-organisation) and introducing some
of the core facts it is intended to capture (statistical generalisations
about the typology of vowel inventories). The core of the book is formed
by chapters 4 and 5 which describe the architecture of the computational
model and the results of the simulation experiments carried out with it.
Chapters 6 and 7 comprise the third part of the book in trying to position
the work reported in the previous two chapters in the broader field of
computational language evolution studies.

Chapter 1 'Introduction', provides a brief synopsis of the book and
defines its goals and approach. The goals are "first to investigate what
mechanisms might be necessary for explaining the universals of human vowel
systems and secondly to investigate what the role of the population might
be in the explanation of linguistic universals" The method used is the
simulation of interaction among language users by means of a computer
model. Unlike studies using a broadly similar methodology (e.g. Kirby
1999, Briscoe 2001), De Boer sets out to investigate the development of
language in a population of fully developed language users rather than the
evolution of the language user in response to the (learnability)
challenges posed by language structure, or the coevolution of language and
language user.

Chapter 2, 'Universal Tendencies of Human Sound Systems' introduces some
well-established generalisations about the typology of 'primary' vowel
systems (systems or the parts of larger systems that are implemented in
terms of height, backness and rounding, rather than e.g. duration or
nasalisation), such as the fact that the most common number of vowels in
an inventory is 5. It then sketches a number of paradigms that have tried
to explain typological patterns, including classical binary feature theory
and Dispersion Theory, and assesses two attempts to embed the insights of
(later versions of) Dispersion Theory in computer simulations of
interacting language users. The chapter concludes with a review of what is
known about the acquisition of phonology in (young) children.

Chapter 3, 'Self-Organization' describes the conceptual basis of the model
defended by the book and explains why it is appropriate to apply this
framework to the study of vowel inventories. Self-organisation broadly
refers to the emergence of regular patterns caused by mechanisms that are
not directed in any sense by or towards such patterns. This notion can be
illustrated by the hexagonal grid pattern of a honeycomb, which is not the
product of bees aiming to construct a regular pattern but a by-product of
the simultaneous building activity of large numbers of bees of roughly the
same size and physical ability. With regard to the modelling of vowel
inventories two forms of self-organisation can be distinguished: the
dispersion of vowels in the inventories of individual speakers, and
second, the sharing of a vowel inventory among the members of a speech
community. De Boer argues that it is appropriate to model the typology of
vowel systems as the product of a self-organising dynamic system because
it seems implausible that speakers have dedicated inventory grammars that
perform global assessments of the amount of auditory dispersion in their
vowel inventories. Given the author's stance that generalisations about
vowel systems should not be attributed to innate (phonological) feature
systems such generalisations should therefore derive from the use
(production and perception) of individual vowels by individual speakers.

Chapter 4, 'The Simulation' provides a fairly non-technical description of
the computational model developed by the author. The architecture consists
of a population of 20 agents representing human language users. Every
agent is endowed with a(n initially empty) inventory of paired
articulatory and auditory vowel targets, a vowel synthesizer (articulation
model) and a vowel recogniser (perception model). Articulatory targets are
represented in terms of height, position and rounding; the auditory target
is represented as a set of coordinates in F1-F2 space expressed on the
Bark scale. The second formant is calculated as the perceptual F2 or F2'
(F2-prime), which takes on board the contribution of higher formants in
the acoustic spectrum to the perceived frequency of the second resonance
peak (cf. Chistovich and Lubliskaya 1979 ).

The agents engage in one-on-one imitation games which start with an
initiator transmitting a vowel sound generated from a randomly selected
articulatory target in its inventory. The receiver, or 'imitator'
classifies this signal in terms of the perceptually nearest vowel in its
own system, synthesises the corresponding articulatory target and sends it
back to the initiator. An imitation game is labelled as successful if the
response signal is classified as identical to the stimulus by the
initiator, and the success or failure is relayed to the imitator in terms
of a 'non-verbal' feedback signal. This feedback signal and the longer
term communicative effectiveness of a vowel category (defined as the ratio
between the number of times a vowel is used and the number of successful
uses) determine how the vowel inventory of the agents is updated after
every game. Vowel targets can be shifted in articulatory and auditory
space, and vowel categories can be introduced, merged, or discarded. The
mapping between feedback (history) and specific update operations
introduce a bias in the model towards a vowel system that is shared by all
members of the population (the first level of self-organisation identified
above). This mapping favours high communicative effectiveness indices for
all individual vowel targets in the inventories of all individual agents
and the sum of these indices is maximal if all agents share the same

The second form of self-organisation (i.e. auditory dispersion in vowel
inventories) emerges as the result of two further properties of de Boer's
model. The first is the addition of 'noise' to the vowel signals
transmitted between the agents. This noise effectively consists of
transforming the signals in the F1 and F2' domains randomly, but within
fixed bounds that represent the 'noise level'. This means that vowel
targets with overlapping noise ranges run the risk of being confused
during imitation games. Given the communicative pressure described in the
previous paragraph, noise addition therefore creates a bias towards
auditory dispersion. Secondly, and essentially to keep systemic pressure
on the model, a random vowel is added to the inventory of an agent with a
probability of .01 per game. This pushes the model away from a shared
inventory with a highly effective single vowel.

Chapter 5 'Results' reports on a series of simulations with the model
introduced in chapter 4. Section 1 of this chapter demonstrates how after
a certain number of games (4000 in the example that is discussed in depth)
the agents converge on shared and finite inventories of vowels that appear
to be well-dispersed in auditory space. Finiteness of vowel inventories is
not derived by Liljencrants and Lindblom or refined versions of Dispersion
Theory (e.g. Schwartz et al. 1997b) in which vowel categories are
represented as points in phonetic space. In de Boer's model on the other
hand, finiteness is a by-product of the mechanism responsible for
dispersion: in a sense, the addition of noise to the transmission process
creates vowel 'categories' that consist of areas rather than points in
phonetic space and since (articulatory) phonetic space is finite, so is
the number of vowel categories. The relation between noise and vowel
categorisation in the model is best illustrated by the fact that the
average size of the emergent inventories is inversely related to the noise

Section 5.2 compares the average results of 1000 simulations each for two
noise levels with inventories in which the vowels are randomly distributed
in phonetic space, using the optimal systems predicted by Dispersion
Theory as a benchmark. This comparison is based on two criteria: imitation
success and energy, a measure of global auditory dispersion used by
Liljencrants and Lindblom (1972). It indicates that the model approximates
the systems predicted by dispersion better than the random inventories
especially at the higher noise level (i.e. for smaller inventories).
Section 5.3 shows that the simulations also produce shared inventories of
articulatory categories especially with regard to tongue height and
position. Section 5.4 extends the model by considering the effects of
population change. Population change is modelled by introducing new agents
with empty vowel inventories and randomly eliminating agents, both with a
certain (low) probability per agent per game. Two types of cases are
considered: (1) the change of a vowel system stabilised in a static
population after the onset of population change; (2) the emergence of
stable inventories in populations that are dynamic from the start. The
conclusion of this section is that, generally speaking, changing
populations support smaller vowel inventories than static ones, in a way
that depends on the replacement rate of the population.

Sections 5.5 and 5.6 discuss the typology of vowel systems generated by a
series of simulations of 250,000 games each at 4 different noise levels.
The phonetic composition of inventories consisting of 3-9 vowels is
assessed against the typological generalisations of Crothers (1978) and
Schwartz et al. (1997a). This reveals a number of non-trivial
correspondences between real and simulated systems. An interesting example
is the frequent occurrence of mid central vowels (e.g. schwa), although
this is possibly related to the compression of the F2' space relative to
human vowel space. Mid central vowels are not derived by classical
Dispersion Theory and Schwartz et al. (1997) explicitly put schwa beyond
the scope of their theory. Finally, section 5.6.9 describes an experiment
which indicates that within a range centred on inventories of size 3-9,
systems of 4 vowels are the most common. Although 5 is the most common
number of vowels in human inventories the emergence of a clear peak at the
right end of the range seems an encouraging result.

Chapter 6 'Simulated Evolution of Other Parts of Language' is essentially
a review of studies using similar methodologies to the one pursued by de
Boer. It discusses work on syntax, semantics, experiments with robots,
and the predictions of simulations of change and spatial dispersion of
populations. In conclusion, chapter 7, 'Implications for Other parts of
Language' attempts to spell out in an informal fashion how the tools used
for the modelling of vowel inventories might be extended to a range of
other linguistic phenomena including fixed word order and tonogenesis.

The properties of the model developed in the book are interesting enough
to make extensions to other domains look worthwhile. Without built-in
global teleologies or pre-existing vowel categories, the model developed
by the author is able to derive vowel inventories that are finite and have
configurational properties that are reminiscent of human systems.
Moreover, in part because an ecological advantage is attributed to the
sharing of systems among individual agents, the model can converge on
phonetically suboptimal systems. The survival of such systems bears some
resemblance to so-called 'crazy' phonological rules (Bach and Harms 1969;
Anderson 1981): rules that have no synchronic phonetic motivation but
which are nevertheless preserved in a language.

De Boer's model extends earlier work on auditory dispersion in vowel
systems by suggesting how dispersed systems could emerge in a speech
community without the intervention of innate phonological or phonetic
categories. However, the import of this work depends on the possibility of
justifying its many assumptions on independent grounds and on the accuracy
of its predictions, and I think this is where the book's main weaknesses
lie. It is true that the author acknowledges many of the abstractions and
limitations of his approach, and I do not think that the lack of
independent evidence for certain assumptions should stop anyone from
exploring its possibilities. But although I share its functionalist
perspective I think that the argument put forward by this book could have
benefited considerably from a more careful assessment of its assumptions
and a more balanced and detailed scrutiny of its predictions.

For example, in the light of its brevity and lack of detail, it seems
unlikely that chapter 2 was designed to convince the reader of the book's
functionalist underpinnings. (Distinctive) feature theory is introduced
and dismissed as inadequate in little more than a page of text, without so
much as bibliographical pointers to more detailed critiques or defences of
formalist approaches to phonology. A number of drawbacks of classical
Dispersion Theory noted by e.g. Schwartz et al. (1997b) and Boersma (1998)
are not examined despite the appearance of these studies in the
bibliography and despite their potential relevance for the design of the
computational model, e.g. with respect to the set of spectral properties
that play a role in human vowel categorisation or such technical issues as
the weighting of F1 and F2' in the distance measure used in the perception

Furthermore, no real justification is offered for the choice not to
incorporate articulatory effort as a parameter in the model. Strong cases
have been made by other functional linguists for the role of effort
minimisation in speech production and (by extension) the shape of sound
inventories (Boersma 1998, Kirchner 1998). If, as the author himself
suggests (e.g. in chapter 3), effort minimisation helps to shape sound
inventories, then excluding it from the model would seem to be a possibly
critical abstraction, and consequently in need of some justification.

As a final example, consider the use of noise in the vowel transmission
process. The human speech chain is certainly noisy in a number of respects
and it seems likely that the way we categorise speech sounds is adapted to
this state of affairs. But de Boer does not attempt to relate the noise
levels in the model to the types and levels of noise that affect human
speech communication, which leaves the question unanswered to what degree
noise addition in the model is realistic and to what degree it is a brute
force way of making the agents use finite and dispersed vowel systems.

With regard to the predictions of his model, De Boer focuses almost
entirely on the typology of vowel inventories. This means that the results
of the experiments reported in section 5.4, on the effects of population
variation, are not tested against any language change (or acquisition)
data. The assessment of the typological predictions itself is not
unproblematic either. In section 5.5 the author ostensibly adopts the
generalisations of Crothers (1978) as a touchstone for the typology
generated by his model. This entails a classification of (small) vowel
inventories mainly in terms of configurational properties (number of
contrasts per dimension, front back asymmetries etc.) rather than the
precise localisation of vowel targets in phonetic space, and the way
phonetic symbols are assigned to the scatter plots in section 5.6
generally bears out this approach. But then these coarse grained
classifications are also used to interpret the results in terms of the
typological generalisations formulated in Schwartz et al (1997a) which
presuppose a more precise assignment of phonetic values to vowel

This results in inconsistencies. For example, even if the compressed F2'
space is taken into account, several of the vowels labelled [o] and some
of the [u]s in the top left panel of figure 5.23 (p. 95) should be
classified as unrounded mid back or mid central vowels according to
Schwartz et al's approach (extrapolating from the mapping defined in
Schwartz et al 1997b). Consequently, for the purpose of evaluating the
quality of the simulations against the generalisations of Schwartz et al.
(1997a), the systems containing these vowels should be classified as
containing a non-peripheral vowel and therefore as similar or identical to
the system in the right top panel of figure 5.23. It may well be the case
that Schwartz et al's assignment of phonetic content to the symbols in
Maddieson (1984) puts dangerous trust in the phonetic accuracy and
consistency of UPSID, and it may also be that 88% of five vowel systems
patterns as in figure 5.23, but given the available data, the author's
claim that "[t]he results of the simulation with five vowel systems ...
correspond very well with what is found in human languages" (p. 94) is
clearly too strong.

A more general methodological issue concerns the way in which the data in
section 5.6 is sampled. It seems that there is a many-to-many mapping
between noise levels and inventory sizes, but for each inventory size
between 3 and 9 the author only considers cases generated at one
particular noise level. In addition, the conclusion in section 5.6.9 that
the system has a preference for 4-vowel inventories is based on a
different set of experiments than the rest of section 5.6. Unfortunately,
no rationale is offered for this methodology, which raises questions about
the consistency across noise levels of the effects reported in this

The final section of chapter 3 tries to justify the book's title in an
evolutionary sense with the argument that excluding biological evolution
from the computational model can help to establish to what extent (the
development of) language has to be genetically determined and to what
extent it could be attributed to functional drives and self-organisation
in a population. However, the agents are endowed with (a simplified model
of) the phonetic capabilities of modern man. It is possible that these
capacities are prelinguistic or extralinguistic in an evolutionary sense,
but since no independent evidence to this effect is offered, this
assumption simply pre-empts the question whether or not some phonetic
skills developed in response to the demands of an emerging language. De
Boer might also have explored how the performance of the model is affected
by starting from non-empty vowel inventories (simulating some form of
innate knowledge).

I think that this part of the argument is therefore typical of the book as
a whole, which raises interesting possibilities based on the behaviour of
a computational model but on a number of points it does not succeed in
relating the model to what it is intended to represent.

Bach, E. and R. Harms (1969) How do languages get crazy rules? In R.
Stockwell and K. Macaulay (eds) Linguistic change and generative theory.
Bloomington: Indiana University Press.

Boersma, P. (1998) Functional Phonology. PhD dissertation, University of

Briscoe, E. (2000) Grammatical acquisition: inductive bias and coevolution
of language and the language acquisition device. Language 76: 245-296.

Chistovich, L. and V. Lublinskaya (1979) The center of gravity effect in
vowel spectra and critical distance between formants. Hearing Research 1:

Crothers, J. (1978) Typology and universals of vowel systems. In J.
Greenberg, C. ferguson and E. Moravcsik (eds) Universals of Human Language

de Boer, B. (1999) Self-organization in vowel systems. PhD dissertation,
Free University of Brussels.

Kirchner, R. (1998) An Effort-Based Approach to Consonant Lenition. PhD
dissertation, UCLA.

Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of
Language Universals. Oxford: OUP.

Liljencrants, J. and B. Lindblom (1972) Numberical simulation of vowel
quality systems: the role of perceptual contrast. Language 48: 839-862

Maddieson, I. (1984) Patterns of Sounds. Cambridge: CUP

Schwartz, J.-L., L.-J. Boe, N. Vallee and C. Abry (1997a) Major trends in
vowel system inventories. Journal of Phonetics 25: 233-253.

Schwartz, J.-L., L.-J. Boe, N. Vallee and C. Abry (1997b) The
Dispersion-FocaliZation Theory of vowel systems. Journal of Phonetics 25:

I am currently trying to finish my PhD dissertation, which develops a
functional account of the phonetics and phonology of laryngeal contrast in
obstruent clusters, mainly in the Germanic languages. My other research
includes an OT account of English phrase and compound stress and work on
the phonetics of discourse prosody. I have (co-)taught several courses in
phonology and phonetics in the departments of Linguistics and Dutch
Language and Literature at the University of Groningen

Thanks to Zoe Toft for comments on a draft of this review
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue