De Boer, Bart (2001) The Origins of Vowel Systems. Oxford University Press, xii+168pp, hardback ISBN 0-19-829965-6, Studies in the Evolution of Language 1.
Wouter Jansen -- Department of Linguistics, University of Groningen, and Department of Linguistics, University of Tuebingen
INTRODUCTION Liljencrants and Lindblom's (1972) Dispersion Theory claims that a number of typological trends in the phonetic composition of vowel inventories can be derived from the assumption that the phonetic implementation of vowel categories is maximally dispersed in the available auditory space. For instance, if a language has three contrastive vowels they are highly likely to be implemented as [i], [a], and [u], which represent the extremities of the F1-F2 triangle (roughly corresponding to height and backness in articulatory terms) in which vowels can be realised. The underlying idea is that dispersion leads to less confusion of contrastive vowels and thereby to more effective communication.
The main claim of the book under review is that, on a number of assumptions, vowel dispersion can emerge in a language without the need for the assessment by individual speakers of the global properties of their inventories. The computational model on which this claim is based derives a number of additional predictions with varying degrees of accuracy about the typology of vowel inventories.
The book is based on the author's (1999) computer science PhD thesis but has been adapted to suit an interdisciplinary readership. It consists of three main parts. Chapters 1, 2 and 3 set the scene by outlining the conceptual basis of the approach (self-organisation) and introducing some of the core facts it is intended to capture (statistical generalisations about the typology of vowel inventories). The core of the book is formed by chapters 4 and 5 which describe the architecture of the computational model and the results of the simulation experiments carried out with it. Chapters 6 and 7 comprise the third part of the book in trying to position the work reported in the previous two chapters in the broader field of computational language evolution studies.
SYNOPSIS Chapter 1 'Introduction', provides a brief synopsis of the book and defines its goals and approach. The goals are "first to investigate what mechanisms might be necessary for explaining the universals of human vowel systems and secondly to investigate what the role of the population might be in the explanation of linguistic universals" The method used is the simulation of interaction among language users by means of a computer model. Unlike studies using a broadly similar methodology (e.g. Kirby 1999, Briscoe 2001), De Boer sets out to investigate the development of language in a population of fully developed language users rather than the evolution of the language user in response to the (learnability) challenges posed by language structure, or the coevolution of language and language user.
Chapter 2, 'Universal Tendencies of Human Sound Systems' introduces some well-established generalisations about the typology of 'primary' vowel systems (systems or the parts of larger systems that are implemented in terms of height, backness and rounding, rather than e.g. duration or nasalisation), such as the fact that the most common number of vowels in an inventory is 5. It then sketches a number of paradigms that have tried to explain typological patterns, including classical binary feature theory and Dispersion Theory, and assesses two attempts to embed the insights of (later versions of) Dispersion Theory in computer simulations of interacting language users. The chapter concludes with a review of what is known about the acquisition of phonology in (young) children.
Chapter 3, 'Self-Organization' describes the conceptual basis of the model defended by the book and explains why it is appropriate to apply this framework to the study of vowel inventories. Self-organisation broadly refers to the emergence of regular patterns caused by mechanisms that are not directed in any sense by or towards such patterns. This notion can be illustrated by the hexagonal grid pattern of a honeycomb, which is not the product of bees aiming to construct a regular pattern but a by-product of the simultaneous building activity of large numbers of bees of roughly the same size and physical ability. With regard to the modelling of vowel inventories two forms of self-organisation can be distinguished: the dispersion of vowels in the inventories of individual speakers, and second, the sharing of a vowel inventory among the members of a speech community. De Boer argues that it is appropriate to model the typology of vowel systems as the product of a self-organising dynamic system because it seems implausible that speakers have dedicated inventory grammars that perform global assessments of the amount of auditory dispersion in their vowel inventories. Given the author's stance that generalisations about vowel systems should not be attributed to innate (phonological) feature systems such generalisations should therefore derive from the use (production and perception) of individual vowels by individual speakers.
Chapter 4, 'The Simulation' provides a fairly non-technical description of the computational model developed by the author. The architecture consists of a population of 20 agents representing human language users. Every agent is endowed with a(n initially empty) inventory of paired articulatory and auditory vowel targets, a vowel synthesizer (articulation model) and a vowel recogniser (perception model). Articulatory targets are represented in terms of height, position and rounding; the auditory target is represented as a set of coordinates in F1-F2 space expressed on the Bark scale. The second formant is calculated as the perceptual F2 or F2' (F2-prime), which takes on board the contribution of higher formants in the acoustic spectrum to the perceived frequency of the second resonance peak (cf. Chistovich and Lubliskaya 1979 ).
The agents engage in one-on-one imitation games which start with an initiator transmitting a vowel sound generated from a randomly selected articulatory target in its inventory. The receiver, or 'imitator' classifies this signal in terms of the perceptually nearest vowel in its own system, synthesises the corresponding articulatory target and sends it back to the initiator. An imitation game is labelled as successful if the response signal is classified as identical to the stimulus by the initiator, and the success or failure is relayed to the imitator in terms of a 'non-verbal' feedback signal. This feedback signal and the longer term communicative effectiveness of a vowel category (defined as the ratio between the number of times a vowel is used and the number of successful uses) determine how the vowel inventory of the agents is updated after every game. Vowel targets can be shifted in articulatory and auditory space, and vowel categories can be introduced, merged, or discarded. The mapping between feedback (history) and specific update operations introduce a bias in the model towards a vowel system that is shared by all members of the population (the first level of self-organisation identified above). This mapping favours high communicative effectiveness indices for all individual vowel targets in the inventories of all individual agents and the sum of these indices is maximal if all agents share the same inventory.
The second form of self-organisation (i.e. auditory dispersion in vowel inventories) emerges as the result of two further properties of de Boer's model. The first is the addition of 'noise' to the vowel signals transmitted between the agents. This noise effectively consists of transforming the signals in the F1 and F2' domains randomly, but within fixed bounds that represent the 'noise level'. This means that vowel targets with overlapping noise ranges run the risk of being confused during imitation games. Given the communicative pressure described in the previous paragraph, noise addition therefore creates a bias towards auditory dispersion. Secondly, and essentially to keep systemic pressure on the model, a random vowel is added to the inventory of an agent with a probability of .01 per game. This pushes the model away from a shared inventory with a highly effective single vowel.
Chapter 5 'Results' reports on a series of simulations with the model introduced in chapter 4. Section 1 of this chapter demonstrates how after a certain number of games (4000 in the example that is discussed in depth) the agents converge on shared and finite inventories of vowels that appear to be well-dispersed in auditory space. Finiteness of vowel inventories is not derived by Liljencrants and Lindblom or refined versions of Dispersion Theory (e.g. Schwartz et al. 1997b) in which vowel categories are represented as points in phonetic space. In de Boer's model on the other hand, finiteness is a by-product of the mechanism responsible for dispersion: in a sense, the addition of noise to the transmission process creates vowel 'categories' that consist of areas rather than points in phonetic space and since (articulatory) phonetic space is finite, so is the number of vowel categories. The relation between noise and vowel categorisation in the model is best illustrated by the fact that the average size of the emergent inventories is inversely related to the noise level.
Section 5.2 compares the average results of 1000 simulations each for two noise levels with inventories in which the vowels are randomly distributed in phonetic space, using the optimal systems predicted by Dispersion Theory as a benchmark. This comparison is based on two criteria: imitation success and energy, a measure of global auditory dispersion used by Liljencrants and Lindblom (1972). It indicates that the model approximates the systems predicted by dispersion better than the random inventories especially at the higher noise level (i.e. for smaller inventories). Section 5.3 shows that the simulations also produce shared inventories of articulatory categories especially with regard to tongue height and position. Section 5.4 extends the model by considering the effects of population change. Population change is modelled by introducing new agents with empty vowel inventories and randomly eliminating agents, both with a certain (low) probability per agent per game. Two types of cases are considered: (1) the change of a vowel system stabilised in a static population after the onset of population change; (2) the emergence of stable inventories in populations that are dynamic from the start. The conclusion of this section is that, generally speaking, changing populations support smaller vowel inventories than static ones, in a way that depends on the replacement rate of the population.
Sections 5.5 and 5.6 discuss the typology of vowel systems generated by a series of simulations of 250,000 games each at 4 different noise levels. The phonetic composition of inventories consisting of 3-9 vowels is assessed against the typological generalisations of Crothers (1978) and Schwartz et al. (1997a). This reveals a number of non-trivial correspondences between real and simulated systems. An interesting example is the frequent occurrence of mid central vowels (e.g. schwa), although this is possibly related to the compression of the F2' space relative to human vowel space. Mid central vowels are not derived by classical Dispersion Theory and Schwartz et al. (1997) explicitly put schwa beyond the scope of their theory. Finally, section 5.6.9 describes an experiment which indicates that within a range centred on inventories of size 3-9, systems of 4 vowels are the most common. Although 5 is the most common number of vowels in human inventories the emergence of a clear peak at the right end of the range seems an encouraging result.
Chapter 6 'Simulated Evolution of Other Parts of Language' is essentially a review of studies using similar methodologies to the one pursued by de Boer. It discusses work on syntax, semantics, experiments with robots, and the predictions of simulations of change and spatial dispersion of populations. In conclusion, chapter 7, 'Implications for Other parts of Language' attempts to spell out in an informal fashion how the tools used for the modelling of vowel inventories might be extended to a range of other linguistic phenomena including fixed word order and tonogenesis.
CRITICAL EVALUATION The properties of the model developed in the book are interesting enough to make extensions to other domains look worthwhile. Without built-in global teleologies or pre-existing vowel categories, the model developed by the author is able to derive vowel inventories that are finite and have configurational properties that are reminiscent of human systems. Moreover, in part because an ecological advantage is attributed to the sharing of systems among individual agents, the model can converge on phonetically suboptimal systems. The survival of such systems bears some resemblance to so-called 'crazy' phonological rules (Bach and Harms 1969; Anderson 1981): rules that have no synchronic phonetic motivation but which are nevertheless preserved in a language.
De Boer's model extends earlier work on auditory dispersion in vowel systems by suggesting how dispersed systems could emerge in a speech community without the intervention of innate phonological or phonetic categories. However, the import of this work depends on the possibility of justifying its many assumptions on independent grounds and on the accuracy of its predictions, and I think this is where the book's main weaknesses lie. It is true that the author acknowledges many of the abstractions and limitations of his approach, and I do not think that the lack of independent evidence for certain assumptions should stop anyone from exploring its possibilities. But although I share its functionalist perspective I think that the argument put forward by this book could have benefited considerably from a more careful assessment of its assumptions and a more balanced and detailed scrutiny of its predictions.
For example, in the light of its brevity and lack of detail, it seems unlikely that chapter 2 was designed to convince the reader of the book's functionalist underpinnings. (Distinctive) feature theory is introduced and dismissed as inadequate in little more than a page of text, without so much as bibliographical pointers to more detailed critiques or defences of formalist approaches to phonology. A number of drawbacks of classical Dispersion Theory noted by e.g. Schwartz et al. (1997b) and Boersma (1998) are not examined despite the appearance of these studies in the bibliography and despite their potential relevance for the design of the computational model, e.g. with respect to the set of spectral properties that play a role in human vowel categorisation or such technical issues as the weighting of F1 and F2' in the distance measure used in the perception model.
Furthermore, no real justification is offered for the choice not to incorporate articulatory effort as a parameter in the model. Strong cases have been made by other functional linguists for the role of effort minimisation in speech production and (by extension) the shape of sound inventories (Boersma 1998, Kirchner 1998). If, as the author himself suggests (e.g. in chapter 3), effort minimisation helps to shape sound inventories, then excluding it from the model would seem to be a possibly critical abstraction, and consequently in need of some justification.
As a final example, consider the use of noise in the vowel transmission process. The human speech chain is certainly noisy in a number of respects and it seems likely that the way we categorise speech sounds is adapted to this state of affairs. But de Boer does not attempt to relate the noise levels in the model to the types and levels of noise that affect human speech communication, which leaves the question unanswered to what degree noise addition in the model is realistic and to what degree it is a brute force way of making the agents use finite and dispersed vowel systems.
With regard to the predictions of his model, De Boer focuses almost entirely on the typology of vowel inventories. This means that the results of the experiments reported in section 5.4, on the effects of population variation, are not tested against any language change (or acquisition) data. The assessment of the typological predictions itself is not unproblematic either. In section 5.5 the author ostensibly adopts the generalisations of Crothers (1978) as a touchstone for the typology generated by his model. This entails a classification of (small) vowel inventories mainly in terms of configurational properties (number of contrasts per dimension, front back asymmetries etc.) rather than the precise localisation of vowel targets in phonetic space, and the way phonetic symbols are assigned to the scatter plots in section 5.6 generally bears out this approach. But then these coarse grained classifications are also used to interpret the results in terms of the typological generalisations formulated in Schwartz et al (1997a) which presuppose a more precise assignment of phonetic values to vowel categories.
This results in inconsistencies. For example, even if the compressed F2' space is taken into account, several of the vowels labelled [o] and some of the [u]s in the top left panel of figure 5.23 (p. 95) should be classified as unrounded mid back or mid central vowels according to Schwartz et al's approach (extrapolating from the mapping defined in Schwartz et al 1997b). Consequently, for the purpose of evaluating the quality of the simulations against the generalisations of Schwartz et al. (1997a), the systems containing these vowels should be classified as containing a non-peripheral vowel and therefore as similar or identical to the system in the right top panel of figure 5.23. It may well be the case that Schwartz et al's assignment of phonetic content to the symbols in Maddieson (1984) puts dangerous trust in the phonetic accuracy and consistency of UPSID, and it may also be that 88% of five vowel systems patterns as in figure 5.23, but given the available data, the author's claim that "[t]he results of the simulation with five vowel systems ... correspond very well with what is found in human languages" (p. 94) is clearly too strong.
A more general methodological issue concerns the way in which the data in section 5.6 is sampled. It seems that there is a many-to-many mapping between noise levels and inventory sizes, but for each inventory size between 3 and 9 the author only considers cases generated at one particular noise level. In addition, the conclusion in section 5.6.9 that the system has a preference for 4-vowel inventories is based on a different set of experiments than the rest of section 5.6. Unfortunately, no rationale is offered for this methodology, which raises questions about the consistency across noise levels of the effects reported in this section.
The final section of chapter 3 tries to justify the book's title in an evolutionary sense with the argument that excluding biological evolution from the computational model can help to establish to what extent (the development of) language has to be genetically determined and to what extent it could be attributed to functional drives and self-organisation in a population. However, the agents are endowed with (a simplified model of) the phonetic capabilities of modern man. It is possible that these capacities are prelinguistic or extralinguistic in an evolutionary sense, but since no independent evidence to this effect is offered, this assumption simply pre-empts the question whether or not some phonetic skills developed in response to the demands of an emerging language. De Boer might also have explored how the performance of the model is affected by starting from non-empty vowel inventories (simulating some form of innate knowledge).
I think that this part of the argument is therefore typical of the book as a whole, which raises interesting possibilities based on the behaviour of a computational model but on a number of points it does not succeed in relating the model to what it is intended to represent.
BIBLIOGRAPHY Bach, E. and R. Harms (1969) How do languages get crazy rules? In R. Stockwell and K. Macaulay (eds) Linguistic change and generative theory. Bloomington: Indiana University Press.
Boersma, P. (1998) Functional Phonology. PhD dissertation, University of Amsterdam.
Briscoe, E. (2000) Grammatical acquisition: inductive bias and coevolution of language and the language acquisition device. Language 76: 245-296.
Chistovich, L. and V. Lublinskaya (1979) The center of gravity effect in vowel spectra and critical distance between formants. Hearing Research 1: 185-195.
Crothers, J. (1978) Typology and universals of vowel systems. In J. Greenberg, C. ferguson and E. Moravcsik (eds) Universals of Human Language
de Boer, B. (1999) Self-organization in vowel systems. PhD dissertation, Free University of Brussels.
Kirchner, R. (1998) An Effort-Based Approach to Consonant Lenition. PhD dissertation, UCLA.
Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of Language Universals. Oxford: OUP.
Liljencrants, J. and B. Lindblom (1972) Numberical simulation of vowel quality systems: the role of perceptual contrast. Language 48: 839-862
Maddieson, I. (1984) Patterns of Sounds. Cambridge: CUP
Schwartz, J.-L., L.-J. Boe, N. Vallee and C. Abry (1997a) Major trends in vowel system inventories. Journal of Phonetics 25: 233-253.
Schwartz, J.-L., L.-J. Boe, N. Vallee and C. Abry (1997b) The Dispersion-FocaliZation Theory of vowel systems. Journal of Phonetics 25: 255-286.
BIOGRAPHICAL SKETCH I am currently trying to finish my PhD dissertation, which develops a functional account of the phonetics and phonology of laryngeal contrast in obstruent clusters, mainly in the Germanic languages. My other research includes an OT account of English phrase and compound stress and work on the phonetics of discourse prosody. I have (co-)taught several courses in phonology and phonetics in the departments of Linguistics and Dutch Language and Literature at the University of Groningen
ACKNOWLEDGEMENT Thanks to Zoe Toft for comments on a draft of this review
|