Editor for this issue: Andrew Carnie <carnie
linguistlist.org>
Regier, Terry (1996) The Human Semantic Potential. Spatial Language and Constrained Connectionism, MIT Press, Cambridge Mass. 220 pages. ISBN 0-262-18173-8. Reviewed by Anne Reboul, CRIN-CNRS, France Regier's book approaches the general problem of the human ability to categorize through an investigation which relies on a computer simulation of the acquisition of spatial linguistic concepts. The computer simulation uses a new strategy, the strategy of constrained connectionnism. SUMMARY 0. Foreword The foreword by George Lakoff insists on the links between Regier's approach and cognitive linguistics. Lakoff concludes that Regier's approach, in that it relies on perceptive and language acquisition mechanisms, is especially well adapted to deal with the semantics of spatial relations which, largely, eschew purely logical approach. 1. Introduction The goal of the book is to characterize what Regier calls the " human semantic potential ", that is the human ability to categorize and its expression in language. Given that languages are not always semantically and conceptually equivalent, trying to circumscribe the possibilities of variation is important. The problem of language variation is approached from the angle of language acquisition and the learnability of linguistic concepts. The book also deals with the problem of the use of connectionism in cognitive science and the solution advocated by Regier is Constrained Connectionism. Though there are quite a lot of different ways to categorize space in languages and though some languages where space is conceived relative to absolute orientations such as north, south, east and west may affect the very perception of space by their speakers (notably Guugu Yimithirr, a native Australian language), Regier claims that there are limits to language variation. Those limits do not correspond to linguistic universals but rather to perceptual universals: in other words, the limit to cross-linguistic variation directly derives from the limit in the human perceptual system. This strong hypothesis is what dictated Regier's answer to the problem of connectionist modelling in cognitive research: that is a model in which cognitive constraints are built (hence the name of Constrained Connectionism), in the case of Regier's model, perceptual constraints. Thus the goal of the book is double: it is both scientific in that it is an investigation of a cognitive faculty, the human semantic potential, and methodological as it proposes a new strategy of cognitive modelling, constrained connectionism, which relies both on simulation and on independently motivated structures, which allow a better analysability of the system as a whole. Regier has based his inquiry in the human semantic potential on the semantics of space in a few languages. He has chosen space because space is a foundational ontological category, often expressed in closed-class forms (classes of linguistic items which contain few members and very rarely admit new members), which strutures other parts of the conceptual system through metaphor and exhibit cross-linguistic variation. The model concentrates on the acquisition of spatial semantics and as such has to face the " no-negative-evidence " problem, i.e. the lack of negative evidence in language acquisition. The solution proposed by Regier and motivated by language acquisition studies is to take each explicit positive instance of a concept as an implicit negative instance of all other concepts. The model takes as input simple movies of two 2-dimensional convex objects moving relative to one another, each movie being labeled as a positive instance of a spatial term from a given language, i.e. a given movie is labelled as a positive instance of " through ". It learns the spatial relations through back propagation (a standard connectionist learning strategy). It should then be able to label correctly new movies. Each movie shows a static object (the landmark = LM) and another object (the trajector = TR). Though the classic connectionist framework has some advantages, notably flexibility and plasticity which allow connectionist networks to simulate a wide range of cognitive and non-cognitive behaviours, it has the important drawback that its explanatory power is very limited, severely reducing its utility in cognitive enquiries. The solution advocated by Regier is constrained connectionism, the construction of connectionist models which would retain much of the plasticity and flexibility of connectionism but in which the incorporation of structural devices independently motivated enhance the analysibility of the models and hence their explanatory powers. 2. The linguistic categorization of space The goal of Regier is to do for space more or less what Berlin and Kay (1978) did for colours. Very roughly, Berlin and Kay showed that there are semantic universals in the domain of coulours. Their work was supported by further studies which showed that there are partial links between the basic colour categories and the neurophysiology of the visual system. The ambition of Regier is to provide similar insights in the domain of space. His main hypothesis, that linguistic categorization in general and in the domain of space in particular is constrained by experience and, in this case, by visual perception, is very similar to that of cognitive linguistics which claims that language is influenced by non-linguistic faculties and hence cannot be studied alone. This, indeed, means that what is relevant is not so much the world or reality itself, but rather the mental representation of it. The notions mainly associated with cognitive linguistics are protypicality (the rejection of Aristotelian semantics and its replacement by graded membership in a category), deixis (the semantic dependency between linguistic items and the physical setting in which they are used) and polysemy (the fact that a word has several meanings). 3. Connectionism and cognitive models Connectionism is an alternative to the classic Von Neumann model of cognitive functioning (or GOFAI: Good Old Fashioned Artificial Intelligence). It relies on Parallel Distributed Processing (PDP), that is on the use of massive parallelism and the distributed nature of representation. The first opposes GOFAI in that it relies on simultaneous processing of numerous units, while GOFAI relies on linear processing; the second opposes GOFAI in that computing units are characterized subsymbolically rather than symbolically, i.e. there is no symbolic interpretation of the function of a single computing unit. Finally it uses back-propagation as a learning algorithm. Classic connectionist models do not incorporate any prestructuring and as a result they are very good at a great variety of learning tasks, but have a very poor analysibility. Structured connectionist models incorporate quite a lot of prestructuring and as a result their processing units are interpreted symbolically (as in GOFAI) rather than subsymbolically as in classic connectionism. They have a much better analysibility but they loose a lot of the flexibility and learning power of classic connectionism. Regier's suggestion, constrained connectionism, is a tentative to capture the best of both worlds, both the learning ability of classic connectionism and the analysibility of structured connectionism. " The essence of the idea is to build PDP networks that have built-in structural devices that constrain their operation " (p 44) and the network is trained under back-propagation. The structural devices incorporated must be independently motivated. The benefits of constrained connectionism that the model, through its in-built structural devices, is better motivated as a whole, its learnability is good and its analysibility is better. 4. Learning without explicit evidence The no-negative-evidence problem is the problem of the limits of generalisation: in the absence of negative evidence, how can over-generalisation be avoided? Regier uses the mutual exclusivity heuristics (positive evidence for an instance of a term is taken as equivalent to negative evidence for all other terms). However, the application of this heuristics is not straightforward and the difficulties it encounters are discussed and solved in this chapter. The main difficulty arises when there is a semantic overlap between terms, which is obviously the fact with spatial terms. In such a case, the risk is the generation of a great number of false implicit negatives. The solution is to view mutual exclusivity " not as an absolute rule governing acquisition, but as a probabilistic bias that can be overriden " (p 65). On this view, explicit positive instances and implicit negative instances are treated differently during training, implicit negative instances providing weak evidence whereas explicit positive instances provide strong evidence. This idea was implemented through incorporation of prior knowledge (in keeping with the constrained connectionism philosophy), allowing a distinction between antonyms (where implicit negative evidence is good) and non antonyms (where it is weak). 5. Structures The design of the structures is crucial in constrained connectionism in as much as it may both enhance the performance of the network and motivate it if it is good and will hamper it and will not motivate it if it is bad. Regier proposes three structures: orientation combination, map comparison and source-path-destination. The first is a weighted combination of different orientations: the direction of potential motion (which relies on the computer equivalent of the mental representation of the forces acting on the object), the proximal orientation (the imaginary straight segment joining TR and LM where they are nearer), center-of-mass orientation (the same between the centers-of-mass of TR and LM). Orientational alignment, which is another part of the same mechanism, is the degree to which an orientation, either relational as those mentioned above or reference (upright vertical for instance), aligns with another. The second is essentially topological and is used in the detection of contact and inclusion. It operates through the comparison of a boundary map for TR, a boundary map for LM and an interior map for LM. The third deals with motion and distinguishes the first frame of the movie, conserved as the source, the last frame of the movie, the destination, and the path, calculated from the frames occuring between the source and the destination. There is no record of the exact time a specific event in the motion occured, but the event is recorded nonetheless. All three structures are at least partly motivated by works in the neurology and psychology of perception. 6. A model of spatial semantics The three principles behind the architecture and design of the model are adequacy (performance), motivation and simplicity. " The model's task is to acquire visually grounded semantics for spatial terms " (p 122). The movies which the model is presented with are arbitrary in length and the model's response to the last frame is taken to be its response to the movie as a whole. Yet the model responds to each frame (not knowing in advance the length of the movie). The model is incomplete in that it does not account for object segmentation, but, as Regier points out, this is not one of its goals. Each successive frame is treated by the structures described above which input their results to the PDP layer. This then produces the output, that is the categorisation which is the categorisation of the movie as a whole if the frame treated happens to be the last one. Given this, the system is trained for several linguistic items simultaneously, for instance " in " and " through ". The model was trained on spatial terms from several languages (Mixtec, German, Japanese, Russian and English), which means that it is significant on cross-linguistic variation. In English, it was trained for " above ", " below ", " left ", " right ", " around ", " in ", " on ", " out of ", " through " and " over ". As wished, the model exhibits some prototype effects. It can adapt itself to a range of dissimilar spatial systems and does so without negative evidence via mutual exclusiveness. There is, however, Regier acknowledges, something non trivial missing from it: an account of the non linguistic spatial conceptual development and its impact on the linguistic categorization of space. This could affect the necessity of simultaneous learning of several terms: for instance, if the notion of inclusion is learned prelinguistically, the acquisition dependency between " in " and " through " would disappear. 7. Extensions In this chapter, Regier proposes several extensions to his model, pointing out that the model was mainly conceived as a framework in which other issues, such as deixis or polysemy, can be approached. He begins with polysemy, distinguishing between the cases where polysemy allows a single abstract sense subsuming the different meanings of the term and those where it does not. In the first case, it helps the system to output the required variations in meaning if some terms with related meanings are learned simultaneously. In the second case, the best thing seems to be to allow the system to learn simultaneously all the items in the contrast set. This is not in itself sufficient to indicate a grouping mechanism inside a contrast set though it does seem to indicate that paradigmatic and syntagmatic contexts play a role in the development of polysemous representations. He then deals with deixis for which an extension of his model was effectively constructed via the inclusion in the movies of a simple indication. This shows that the model as it is can deal with deixis. The main concern of Regier is with prelinguistic conceptual development. The principal problem for his model is that in order to deal with it and notably with the acquisition of non-linguistic concepts of space, the model would have to incorporate desire, action and reaction. He makes a few tentative propositions toward such an extension. He also deals with the notion of key events, noting that it would be tantamount with the selection by the system of a few frames in the movies it is presented with and that it might allow the system to give graded judgments. The notion of distance could be treated by the simple device of measuring the length of proximal orientation and center-of-mass distance, yielding to measures, proximal distance and center-of-mass distance. These, combined with the focus of attention could extend the system's learning ability to distance terms such as " near " and " far ". The notion of convex hull could help to treat concave objects, as well as spatial terms such as " between ". The use of implicit paths, finally, would help the system to deal with combination of spatial terms. 8. Dicussion " In [Regiers's] estimation, the degree to which the work succeeds varies considerably depending on the particular aspect of the work examined. From some standpoints, the model lives up to expectations; from others, it does not " (p 186). Regier begins by comparing his model to other approaches, Chomsky's and Kay and McDaniel's as far as the basic ideas behind his model are concerned, Miller and Johnson-Laird's, as well as Landau and Jackendoff's for purely spatial analysis. The main discussion in this chapter, however, centers on falsifiability. Given that both Chomskyan linguistics and cognitive linguistics, from which Regier has borrowed some of his central hypotheses have been attacked as not falsifiable, his concern is with whether his model could receive the same reproach. It does not in that it could fail to learn a new spatial linguistic system. Indeed, as Regier points out, it is false, because it could not, in its present state, learn the spatial system of Guuru Yimithirr, which uses absolute coordinates. Regier distinguishes between shallow failures and profound failures (failures which could be easily remedied through a minor modification of the model and failures that could not) and informative and non-informative failures. He then shows that the failure of his model to learn Guuru Yimithirr is a moderately informative failure and a shallow one and, hence, does not discredit the model. On his view, however, the main weakness of the system is its failure to accomodate non linguistic concepts of space. Yet, the model has quite a few advantages: it is neurally, psychologically and linguistically motivated; it illustrates constrained connectionism which is a progress on both classic and structured connectionism and it seems to avoid the difficulties which connectionism has met with. CRITICAL EVALUATION Regier's book is, to my mind, excellent. It deals seriously, intelligently and honestly with a lot of central issues in current cognitive science: those of computer modelling, language acquisition, space and its categorization, and, as he calls it, the human semantic potential. It is well-informed and has a very good bibliography. It acknowledges its own weaknesses and highlights its failures. It never overplays its results. I cannot see any criticism of it which would not be unfair. However, its very excellence serves to illustrate the present difficulties in Artificial Intelligence: though Regier does not claim to give a semantics for spatial terms and indeed does not attempt to do it, it is slightly disappointing that, in the end, and despite its system's success in learning spatial terms, we do not seem to be nearer to a non " ad hoc and incohesive " semantic definition for spatial terms (two criticisms which he applies to Miller and Johnson-Laird's definitions) than we were before. Though, as pointed out previously, this was not a part of his goal, it is nonetheless a serious problem, in that it seems to show that the whole semantics embodied in his model is either insufficient to account by itself for the whole semantics of a spatial term (the structural devices do not by themselves provide a semantic definition of any sort: they only provide their results to the learning part of the system, i.e. the PDP layer) or has very poor analysability (the PDP layer). It could be said that a semantic for spatial terms can be provided, despite its poor analysability, through the analysis of the working of the PDP layer. But this would mean that the movies in the training set provide an exhaustive covering of instances of the term, something which presumably is not the case given that the movies only represented convex objects (and the notion of convex hull given in chapter 7, though interesting, would not be sufficient for concave objects). Again, this cannot be seen as a criticism of Regier's entreprise as it was not a part of it to provide a semantics of spatial terms, but it is a good indication of the sort of difficulties computational linguistics meets with all the time. In the same line, I would say that I am not sure that Regier's model yields truly original insights on language learning in general and spatial terms acquisition in particular (the most interesting suggestion, that some terms must be learned simultaneously, is seriously weakened by the absence of an account of non linguistic spatial concepts). However, there is no doubt that it is a very good (and successful in the limits which he himself indicates) test of the fundamental hypotheses on which the system is built as well as of the particular structures which are incorporated in it. In other words, human learning may not function exactly in the way he assumes, but it could function that way, and that, in itself, is an interesting result and a result which cannot be ignored by anyone trying to build a semantics for spatial terms. And, as George Lakoff says, it shows the importance of non-linguistic data for language acquisition. This book SHOULD be read by any informed reader interested in cognitive science, the semantics of space, language acquisition, artificial intelligence and linguistics, cognitive or otherwise. References Berlin, B. & Kay, P. (1969), Basic color terms.Their universality and evolution, Berkeley, University of California Press. Kay, P. & McDaniel, C.K. ( 1978), " The linguistic significance of the meanings of basic color terms ", Language 54, 610-646. Landau, B. & Jackendoff, R. (1993), " What " and " where " in spatial language and spatial cognition, Behavioral and Brain Science 16, 217-265. Miller, G.A. & Johnson-Laird, P.N. (1976), Language and perception, Cambridge, Mass., Harvard University Press. Reviewer: Anne Reboul, Research Fellow at the CNRS (National Center for Scientific Research) France. PhD. in Linguistics, PhD in Philosophy, currently worlking in The Center for Computer Research in Nancy, in the team dedicated to man-machine Dialogue. Has written quite a few papers both in French and in English, Co-author of the Dictionnaire Encyclopedique de Pragmatique (Paris, Le Seuil. English translation in preparation for Basil Blackwell, Oxford). Reviewer's address: Anne Reboul CRIN BP 239 54506 Vandoeuvre-les-Nancy FRANCE <Anne.ReboulMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueloria.fr>