Date: Fri, 30 Apr 2004 11:24:23 +0200 From: Lourdes Aguilar Subject: Phonetics and Phonology in Language Comprehension and Production
EDITORS: Schiller, Niels O.; Meyer, Antje S. TITLE: Phonetics and Phonology in Language Comprehension and Production SUBTITLE: Differences and Similarities SERIES: Phonology and Phonetics, 6 PUBLISHER: Mouton de Gruyter YEAR: 2003
Lourdes Aguilar, Autonomous University of Barcelona
In the domain of linguistics, the relation between phonetic and phonological representation has constituted a controversial issue since the seminal works by Saussure, Baudouin de Courtenay or the Prague Circle (Anderson, 1985). It is not until recently, that researchers coming from the phonetics, with data of production, acoustics or perception studies, have examined the relations between abstract and concrete realizations of speech; and phonologists have tested their models by designing experiments (Beckman (ed) 1990, and the series of Papers in Laboratory Phonology: Kingston & Beckman, 1990; Docherty & Ladd, 1992; Keating, 1994; Connell & Arvaniti, 1995, Broe & Pierrehumbert, 2000; Local, Ogden & Temple, 2004). In the domain of psycholinguistics, due to the obvious differences in processing, much research has concentrated in either speech production or speech comprehension without considering the other process in detail. The book under review constitutes an attempt to fill this gap, paying a special attention to the relation between the processes involved in speech.
The book opens with a clear introduction written by the editors, Niels O. Schiller and Antije S. Meyer, which helps us to understand the main scope of the book: to ask how exactly speech production and comprehension are similar or different from each other. In the first chapter ("Neighbors in the lexicon: Friends or foes"), Gary S. Dell and Jean K. Gordon discuss the effects of the neighborhood density (the number of words that are phonologically similar to the targeted word), a property that appears to have opposite effects in speech production and perception: high neighborhood density has detrimental effects in comprehension tasks, but facilitatory effects in production tasks. These findings are explained in terms of the two-step interactive- activation model of lexical access (Dell et al, 1997), using a normal version of the model, and versions designed to reflect aphasic lesions. The authors argue that the interactive property of the model -that activation feeds back from phonological units to lexical units during production- allows for a target word's neighbors to increase the probability with which that word is selected. Production and comprehension differ in their response to neighborhood density because production and comprehensions tasks create different environments. In comprehension tasks, where phonological neighbors are serious competitors, a densely neighborhood is a disadvantage to an accurate retrieval. On the contrary, in production tasks, the set of competitors is defined on semantic grounds, and as a consequence, neighborhood density facilitates the retrieval of the target. In summary, whether neighbors are competitive or cooperative depends on the task: they are costly in recognition but beneficial in production..
The chapter "Continuity and gradedness in speech processing" (James Mc Queen, Delphine Dahan and Anne Cutler) focuses on the mapping of the speech signal onto stored lexical knowledge in speech comprehension, and compares it to the mapping of stored representations onto articulatory commands in speech production. Their proposal is that the two processing systems are fine-tuned to the different task demands of speech decoding and encoding. They argue that speech decoding is continuous and graded: information flows through the recognition system in cascade all the way up to the meaning level, with no discrete processing stages. Furthermore, the authors review empirical evidence for the assumptions of multiple activation and relative evaluation of lexical candidates: the multiple candidate words compete with each other, and their activation is modulated by the subphonemic detail in the speech signal. The way that phonetic and phonological information is processed in speech encoding appears to be very different. Lexical access is a two-stage process, with, on some accounts (e.g. WEAVER++ production model), strict seriality, and on other accounts (e.g. DSMSG model), limited cascade between levels. In no current production models is there massive parallel activation of word forms. Furthermore, it appears that subphonemic detail need not be specified; instead, this type of detail could be filled in by post-lexical rules. As said before, these differences may be explained by differences in the tasks during speech production and comprehension: a system continuously processing phonetic detail is perfect for speech comprehension, but inefficient for production.
In the next chapter ("The internal structure of words: Consequences for listening and speaking"), Pienie Zwitserlood discusses the representation and processing of morphological information. The main question is if linguistically motivated distinctions of morphological classes translate to language processing, and where do morphemes play a role during speaking and listening. She argues in favour of an independent level of morphological level, distinct from syntactic, semantic and phonological levels, and shared by speech production and comprehension processes. There is evidence for that, even if word forms are different: whereas lemmas are selected in production, word forms are selected in comprehension, i.e., the word forms of speech perception are larger than individual morphemes.
The issue examined in the chapter written by Ardi Roelofs ("Modeling the relation between the production and recognition of spoken word forms") is whether a single system participates in phonetic and phonological processing during both production and recognition; or, instead, there are separate phonetic and phonological systems for production and recognition. The question is addressed within the context of computationally implemented models of spoken word recognition (TRACE, Shortlist) and production (DSMG, WEAVER++). A shared system must have bidirectional links between sublexical and lexical units: this would imply feedback from the sublexical to the lexical level in production, and from the lexical to the sublexical level in comprehension. To contrast this condition, Roelofs reviews findings from a variety of sources (picture naming tasks, chronometric tasks, neuroimaging studies) and concludes that there is no positive evidence for feedback. Instead, the available evidence supports the idea of separate but closely linked feed- forward systems for comprehension and production.
In the chapter "Articulatory Phonology: A Phonology for public language use", Louis Goldstein and Carol A. Fowler are faced with two goals: (a) to develop a realistic understanding of language forms as language users know them, produce them and perceive them, and (b) to understand how the forms might have emerged in the evolutionary history of humans, and how they arise developmentally, as child interacts with speakers in the environment. Their seminal idea is that language forms are kinds of public action, not exclusively mental categories. The relation between speech production and speech perception is discussed from the point of view of Articulatory Phonology. In Articulatory Phonology, vocal tract activity is analyzed into constriction actions (gestures) of distinct vocal organs, and constriction formation is appropriately modeled by dynamical systems. The authors argue that the phonological knowledge can be conceived as abstract constraints on gestural coordination, and that the language forms that languages users know, produce and perceive must be the same. >From the point of view of Articulatory Phonology, gestures are the common phonological currency between producers and listeners: gestures are preserved from language planning, via production to perception, and directly structure the acoustic signal. Listeners, therefore, perceive gestures, not acoustic cues. This claim has not been accepted in the speech research community, but Goldman and Fowler think that there is enough indirect evidence for it.
The chapter "Neural control of speech movements" by Frank H. Guenther, addresses the representations required for producing the syllables that make up a spoken utterance (including auditory, tactile, proprioceptive and muscle command representations) and their interactions with reference to a model of the neural processes involved in the production of speech sounds. Guenther proposes a neural network model of speech motor skill acquisition and speech production (DIVA). This model provides a unified explanation for a wide range of data on articulator kinematics and motor skill development that were previously addressed individually, including functional brain imaging, psychophysical, physiological, anatomical and acoustic data. One advantage of the neural network approach is that it allows analyzing the brain regions involved in speech in terms of a well-defined theoretical framework. According to the model, speech perception and production are supported by separate, but closely linked cortical areas.
Miranda van Turennout, Bernadette Schmitt and Peter Hagoort ("When words come to mind: Electrophysiological insights on the time course of speaking and understanding words") focus on testing one specific model (WEAVER++) to demonstrate the value of electrophysiological data for speech comprehension and production research. They describe how event- related brain potentials (ERPs) can be used to determine: (a) the time that is needed for the retrieval of distinct types of lexical information, and (b) the order in which semantic, syntactic and word form information is retrieved in speaking, listening and reading. The authors compare the empirical data with the time course estimates derived from the WEAVER++ model of speech production. The ERP comprehension data together with the production data provide support for the tested model. The findings indicate that during speech production semantic processing precedes syntactic processing and phonological encoding. Furthermore, the results of ERP studies support the assumption made by WEAVER++ that word form information is accessed before syntactic and semantic information, i.e., the ordering of the retrieval processes is reversed in speech comprehension.
The last two chapters are dedicated to discussing different approaches to the issue of the acquisition and representation of phonetic and phonological categories in bilingual speakers. Núria Sebastian-Gallés and Judith F. Kroll ("Phonology in bilingual language processing: Acquisition, perception and production") examine the organisation of sound systems in bilingual infants and adults. A central research issue is to what extent bilinguals possess one or two phonological systems. The results of the production and perception studies reviewed by the authors converge closely in showing that lexical access is nonselective with regard to language. One interesting aspect of the comparison between perception and production is the difference related to the part of the lexicon that becomes activated. In the perception studies, it is the phonology of lexical form relatives that is engaged: i.e., words that sound like the target word are activated regardless of the language from which they are drawn. In the production studies, it is the phonology of the translation equivalent that is active in addition to phonological neighbors of the utterance itself.
James E. Flege ("Assessing constraints on second-language segmental production and perception") examines theory and evidence related to the production and perception of phonetic segments by second language (L2) learners and monolingual native speakers of the same language. He presents the Speech Learning Model (SLM), developed by him and other colleagues. This model focuses explicitly on L2 speech acquisition, and aims to account for changes across the life span in the learning of segmental production and perception. The SLM proposes that native vs. non- native differences are more likely to arise as the result of interference prior phonetic learning than from a loss of neural plasticity: that is, adults retain the ability to form new phonetic categories for speech sounds in L2, but phonetic category formation becomes more difficult with increasing age because the phonetic systems of the two languages are not fully separate. To support these hypotheses, Flege provides empirical evidence from production and perception studies of L2 vowel acquisition.
The book under review is not presented as a unified theory accounting for speech production and comprehension; instead, contributors tackle the central problem of the similarities and differences between speech comprehension and production in different ways, employing contemporary approaches such the use of neuroimaging studies, electrophysiological data or computational models. Nevertheless, the book is cohesive due to the effort made by the authors so as to furnish new data and explanations concerning the processes involved in speech.
At the end of the reading, we can infer that there is agreement across authors on at least, one important point: regardless of the different communicative demands faced by listeners and speakers, nobody assumes that there are entirely independent representations used exclusively in production and comprehension. On the contrary, the different task demands can account for the different patterns found in production and comprehension research, as proved in the chapters written by Dell and Gordon and by McQueen, Dahan and Cutler. Though for different reasons, Zwitserlood and Roelofs argue that there must be separate phonological and phonetic components for production and comprehension, while the other levels, are likely to be shared.
The consideration of the units and levels implied in the models is also well covered. For instance, Guenther proposes a neural network model which captures the control of processes from the syllabic level to the level of muscle commands whereas Roelofs' WEAVER++ model specifies speech-planning process up to the syllabic level.
Interestingly, the outstanding methodological standard of all papers raises the issue of whether the experimental design is likely to constrain the nature of the conclusions. Related to this, van Turennout, Schmitt and Hagoort present very similar experimental paradigms that can be used to study word production and comprehension, which is welcomed by all those interested in the investigation of similarities and differences between speech production and comprehension.
The book achieves a high-level scientific standard: the individual chapters are written by authorities in their respective academic subdisciplines, and topics are examined in depth, relating the empirical evidence to computational or theoretical models. Furthermore, the text structure facilitates the assimilation of the contents: all the chapters begin with a brief summary, and end with a conclusion in which the main arguments of the chapter are recapitulated. To complete the data discussed in each chapter, a complete list of references appears at the end. This relative independence makes possible a whole reading of the book, or a single chapter reading.
As said before, a good point of the book is the inclusion of various approaches, giving a comprehensive overview of the research in the domain of speech production and comprehension. Nevertheless, this advantage can be perceived as a caveat by the readers, since the differences in treatment and the terminology used can make the reading difficult. For this reason, a glossary with the concepts and terms used in the book would be helpful.
To end up, the volume under review constitutes an important contribution to the study of the representation of phonetic and phonological knowledge in speech comprehension and production, and their interface. It will be of interest to a wide range of researchers in phonetics, phonology, psycholinguistics, cognitive science and the study of speech lesions. With the tutorial and assistance of the teacher, it is accessible to undergraduate students, while for postgraduate students and speech researchers is an excellent state-of- the-art of the existing works and a text full of suggestions for additional research.
Anderson, Stephen R. (1985) Phonology in the Twentieth Century. Theories of Rules and Theories of Representations, The University of Chicago Press, Chicago and London.
Beckman, Mary E. (ed.) (1990): "Phonetic representation", Journal of Phonetics, 18.
Broe, Michael B. & Janet B. Pierrehumbert (eds) (2000) Papers in Laboratory Phonology V. Language Acquisition and the Lexicon, Cambridge, Cambridge University Press.
Connell, Bruce & Amalia Arvaniti (eds) (1995) Papers in Laboratory Phonology IV. Phonology and Phonetic Evidence, Cambridge, Cambridge University Press.
Dell, Gary S., Myrna F. Schwartz, Nadine Martin, Eleanor M. Saffran and Debra A. Gagnon (1997) "Lexical access in aphasic and nonaphasic speakers", Psychological Review 93: 283-321.
Docherty, Gerard J. & D. Robert Ladd (eds) (1992) Papers in Laboratory Phonology II. Gesture, Segment, Prosody, Cambridge, Cambridge University Press.
Keating, P. A. (ed.) (1994) Papers in Laboratory Phonology III. Phonological Structure and Phonetic Form, Cambridge, Cambridge University Press.
Kingston, John & Mary E. Beckman (eds) (1990) Papers in Laboratory Phonology I. Between the Grammar and Physics of Speech, Cambridge, Cambridge University Press.
Local, John, Richard Ogden & Rosalind Temple (2004) Papers in Laboratory Phonology VI. Phonetic interpretation, Cambridge, Cambridge University Press.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
Lourdes Aguilar is a lecturer of Spanish language and linguistics in the Department of Hispanic Philology at the Autonomous University of Barcelona. Her research interests include phonetics and phonology and their interface, discourse structure, and speech and language technologies.