Tesar, Bruce and Paul Smolensky. 2000. Learnability in Optimality Theory. Cambridge, MA: MIT Press. 138 pp. $25.00/�16.95
Reviewed by: Larry L. LaFond, University of South Carolina, Columbia.
This small but intriguing book focuses on a particular problem in language learning--how learners, who often receive overtly ambiguous language data, are faced with a serious paradox: they cannot determine a grammar's hidden structure until they have constructed a grammar based upon their interpretation of the overt forms they hear, but they cannot construct a grammar without some analysis of the hidden structure. To address this paradox, Tesar and Smolensky (T & S) have proposed a learning procedure where learners' first guesses at a structural analysis are used to improve their grammar, and this improved grammar is then able to improve the analysis. In other words, through successive approximation, learners acquire progressively better interpretations and a progressively better grammar simultaneously. T & S look to Optimality Theory (OT) for the core principles that inform this learning strategy, and in this book they evaluate their proposed model, Robust Interpretive Parsing/Constraint Demotion (RIP/CDA), both for accuracy and computational efficiency, through a series of computer simulations and by a set of formal proofs.
The organization of the book:
Chapter 1 (18 pp.) presents the central claim of the book--that OT provides the learning mechanism (RIP/CDA) through which the interdependence of grammars and structural descriptions are overcome, allowing the learner to both assign structure and to learn grammar at the same time. This chapter also gives a broader context for this claim through a review of the issues surrounding learnability and Universal Grammar, and through a terse introduction to the tenets of OT and a decomposition of the learning problem into several parts--deducing hidden structure in language data, using the data to improve the existing model, assigning an improved hidden structure to the original overt data, and once again learning the grammar (using a 'robust' parser). This divides the problem into one of parsing and grammar learning.
Chapter 2 (19 pp.) develops the overview of OT begun in Chapter 1, and illustrates the tenets of OT through a phonological example, an analysis of basic CV syllable structure, and a syntactic example, based on the analysis of null subjects given by Grimshaw and Samek-Lodovici (1995).
Chapter 3 (20 pp.) is devoted to a discussion of 'Constraint Demotion', i.e., that constraints violated by grammatical structural descriptions must be demoted (not promoted), in the total ranking of constraints, below constraints violated by competing (ungrammatical) structural descriptions. The same phonological and syntactic examples used in Chapter 2 are again employed here to demonstrate how learners use the interaction of violable principles to converge upon the target structure. Chapter 3 includes an important discussion of the relationship between data complexity, the number of constraints, and the learnability of a grammar. T & S demonstrate that, although the total number of possible rankings in an OT system may be quite high with even a limited number of constraints, the restrictiveness of the structure OT places on the grammar permits learners to efficiently arrive at a target grammar in a reasonable number of learning steps.
Chapter 4 (22 pp.) applies the proposed iterative learning algorithm to the domain of metrical stress. The goal of the chapter is to present an empirical test demonstrating that the RIP/CD algorithm can overcome ambiguity in overt forms. To accomplish this, T & S use a computer simulation where 124 languages are presented with 62 overt forms from a target language. These forms were processed by the languages via the learning algorithm T & S propose. Each performance of Constraint Demotion was considered a learning step. The results showed that 120 of the 124 simulations resulted in a successful learning of the grammar in an average of 7 steps, a number of steps that T & S highlight is well below the number of constraints.
Chapter 5 (11 pp.) addresses, albeit very briefly, two central issues in language learning: first, how the learner is constrained to select the most restrictive language consistent with the data (subset principle) and, second, how the language-specific inventory of lexical underlying forms is learned. In regards to the subset principle, T & S propose that in learners' initial hierarchies, all markedness constraints dominate all faithfulness constraints. In regards to the lexicon, T & S attempt to extend the same iterative strategy used for grammar learning/parsing to also encompass the simultaneous learning of constraint rankings and underlying representations of the lexicon.
Chapter 6 (6 pp.) is entitled 'Learnability and Linguistic Theory' and serves as a concise apologetic for the use of an OT approach to address issues of language learning. In so doing, T & S argue that OT learning algorithms are derived solely from general grammatical structure and are informed by a specific theory of grammar. They contrast this to other generic search procedures, or to theories such as Principles and Parameters, where T & S see conflicting needs for parameters to be independent with restricted effects, but also explanatory, with wide-ranging effects.
Chapter 7 (20 pp.) is quite dense, consisting primarily of formal proofs regarding the correctness and data complexity of CD. The focus of these proofs is to show, first, that given an adequate data set, the RIP/CD algorithm is guaranteed to converge upon correct ranking and, second, that the amount of data needed to form an adequate data set is never more than N (N-1) informative examples, where N is the number of constraints.
Chapter 8 (18 pp.) examines production-directed parsing, the process T & S consider responsible for learners' ability to efficiently compute informative competing structural descriptions. In this chapter, T & S argue that production-directed parsing uses the same computational procedure as robust interpretive parsing, they discuss this use in language learning, and they supply algorithms for performing production-directed parsing. In so doing, they address 'parsing' as an issue not solely related to comprehension, but as a process that more generally assigns structure to an input and, thus, a process important for both comprehension and production.
The book concludes with 4 pages of endnotes, divided by chapter, a list of references (6 pp.) and an index (2 pp.).
Comments
This book represents the long-anticipated result of years of collaborative research between the authors, and papers published by the authors (1993, 1995, 1996, 1998). The end result is a much clearer and accessible presentation than any of the previous treatments. The book now represents a solidly presented application of formal learning theory to the problem of language acquisition.
T & S clearly view their proposal as proceeding from the central principles of OT and, in turn, supporting OT as a theory of language. The closeness of the ties between this learning proposal and OT are an asset from the standpoint of producing a coherent account of how linguistic theory and the issue of learnability relate, but this same closeness may be a liability for the broader acceptance of T & S's ideas beyond an audience not already amenable to OT.
The RIP/CD algorithm requires constraint interaction (and, hence, violable constraints) for its operation, since evaluation of whether a form is 'best' in comparison to its competitors is through an operation that assesses the number of violations incurred by a pair of candidates, scores out marks common to the winning and losing candidates, and demotes the constraints violated by the winner down in the hierarchy so they can be dominated by the constraints violated by the losing candidate. The algorithm demotes constraints only as far as necessary and, although the learning process operates within a hypothesis space consisting of stratified hierarchies, the end result is a total ranking of a hierarchy that correctly converges on the target grammar. Since the operation of the grammar and learnability are so closely connected in this process, T & S's proposal is intuitively appealing.
As with many aspects of OT, research on learnability in OT is still in its infancy, and T & S's proposal represents not only a pioneering effort, but also one of the most fully developed proposals to date. Other proposals for ranking algorithms (e.g., Broihier 1995; Pulleybank and Turkel 1995; Boersma 1997, et al.) have addressed various problems encountered in this line of research. For example, the 'Gradual Learning Algorithm' developed by Boersma (1997) claims certain advantages over T & S's proposal here, namely that it can handle free variation and noisy learning data, and that it can account for gradient well-formedness judgments.
All of these approaches, especially in as far as they wish to also account for syntactic data within an OT framework, still require further explanation concerning the nature of input in OT. Under this system, learners must have access to input data and GEN (McMahon 2000:52 notes this must also involve access to both candidates and their violation marks, a problem given the paucity of negative evidence normally available in learning data (Kager 1999:302)), still, it is not yet fully clear whether learners assume everything outside the input is suboptimal, whether they begin with an unranked set of constraints, or how variable input may provide data rich enough for grammar learning.
These issues are not a unique challenge for the present volume, however, and T & S have succeeded in providing a clear presentation of OT applied to language acquisition problems. T & S's major claim is tightly argued with few departures, and fewer annoyances to distract the reader from the main point (one exception readers will note is typographical error in the text on p. 27; the reference to 'table 1.1' in the second paragraph should read 'table 2.1'). It was perhaps the stringency of the argumentation that sometimes leaves the reader wanting more, particularly in the less developed 'Learnability and Linguistic Theory' chapter of only 6 pages.
This book may find use in introductory courses, especially as an introduction to OT (since it provides an introduction to the theory, an explanation of its usefulness for issues of learnability, and illustrative examples of its application), although many will find parts of Chapters 7 and 8 rather inaccessible. This book is certainly appropriate for graduate linguistic courses or seminars. In such courses, it will no doubt serve a discussion- provoking purpose as robust as the interpretive parser it proposes.
References
Boersma, P. 1997. How we learn variation, optionality, and probability. Ms. University of Amsterdam. ROA-221.
Broihier, K. 1995. Optimality-theoretic rankings with tied constraints: Slavic relatives, resumptive pronouns and learnability. Ms., Department of Brain and Cognitive Sciences, MIT. ROA-46.
Grimshaw, J. and V. Samek-Lodovici. 1995. Optimal subjects. University of Massachusetts Occasional Papers in Linguistics (UMOP), 589-605.
Kager, R. 1999. Optimality Theory. Cambridge: Cambridge University Press.
McMahon, A. 2000. Change, chance, and optimality. Oxford: Oxford University Press.
Pulleybank, D. and W. J. Turkel. 1995. Traps in constraint ranking space. Paper presented at Maryland Mayfest 95: Formal Approaches to Learnability, University of Maryland, College Park.
Tesar, B. and P. Smolensky. 1993. The learnability of Optimality Theory: An algorithm and some basic complexity results. Technical Report CU-CS-678-93. Department of Computer Science, University of Colorado, Boulder. ROA-2.
Tesar, B. and P. Smolensky. 1995. The learnability of Optimality Theory. In Proceedings of the 13th West Coast Conference on Formal Linguistics, ed. R. Aranovich, W. Byrne, S. Preuss, and M. Senturia, 122-137. Stanford, CA: CSLI Publications.
Tesar, B. and P. Smolensky. 1996. Learnability in Optimality Theory. John Hopkins University Technical Report JHU-CogSci-96-3.
Tesar, B. and P. Smolensky. 1998. Learnability in Optimality Theory. Linguistic Inquiry 29:229-268.
- -------------------- Larry LaFond, a Ph.D. candidate at the University of South Carolina, has research interests in second language acquisition theory, discourse analysis, and intercultural pragmatics. His dissertation research employs an OT framework in a developmental account of the acquisition of null subjects, inversion, and that-trace effects by native speakers of English learning Spanish.
|