Date: 05-Apr-2005
From: Richard Sproat <rws uiuc.edu>
Subject: New: A Challenge to the Minimalist Community
BACKGROUND Over the past fifteen years there has been significant progress in the field of statistical parsing. Much of the work has focussed on supervised methods, where by ''supervised'' we mean that the training data consists of sentences and their associated syntactic trees (for example, Charniak 1997, Collins 1999, Roark and Johnson 1999). There are a number of treebank corpora, of which the Penn Treebank, based largely on ''Wall Street Journal'' text, and available from the Linguistic Data Consortium, is the most widely used. As long as one stays within genre (e.g. newspaper text), it is now possible to train such parsers to perform on held-out test data with a weighted F-score accuracy of between 80% and 90%, measured by comparing the trees that the system produces with the hand- constructed trees for the test data. More impressively, there has been recent work by Dan Klein and Chris Manning on machine learning procedures (Klein & Manning 2002, Klein & Manning 2004) that infer parsers in an unsupervised fashion from raw text annotated with part-of- speech tags. These systems attempt to do what a child does when it learns his/her language. They infer structure from distributional patterns in a more or less unannotated sequence of tokens. The Klein-Manning procedures do not yet perform as well on held out test data as the supervised systems, but they are quickly converging on supervised results. What is particularly notable about the Klein-Manning grammar induction procedures is that they do what Chomsky and others have argued is impossible: They induce a grammar using general statistical methods which have few, if any, built-in assumptions that are specific to language. There has also been significant progress in the realm of hand-built parsers that adopt certain grammatical frameworks. For example, the PARC LFG parser, developed by Ron Kaplan and colleagues over the past twenty years, with intensive manual labor, performs at a level comparable to current statistical parsers on Penn Treebank data (Riezler et al. 2002). Surprisingly, one approach to grammar that is not represented in work on robust parsing is the Principles and Parameters model, or what has evolved over the last ten years into the ''Minimalist Program'' (MP). At the risk of being accused of nomenclatural solecism, we will simply refer to it as ''Principles and Parameters'' (P&P). While there have been P&P-based parsers (Fong 1991, 2005), and even attempts to build parsers that learn P&P-style grammars (Berwick, 1982), all of these systems have one thing in common: they are small-scale implementations that make no attempt at broad linguistic coverage. They are not even up to the task of parsing arbitrary Dr. Seuss books, let alone the ''Wall Street Journal''. For example, Fong's English parser, which is arguably the best piece of work of this kind, handles, in its current incarnation, just the example sentences in Lasnik and Uriagereka's textbook (1988). Crucially, there has been, to our knowledge, no interest in developing a broad-coverage P&P parser. Even more puzzling is the lack of any serious attempt to build a P&P-style parser that is able to learn from unannotated input (as Klein and Manning's systems do). What is odd about this is that it is practically de rigueur when one writes a paper in the P&P framework, to invoke the old arguments about ''poverty of stimulus'' and how the only feasible explanation for the way in which children acquire language is by having a substantial portion of the grammatical knowledge already hard-wired in the form of grammatical principles and parameters. Why, if this is the case, has nobody tested this claim by building a computational model that is able to do what every child is able to do: namely learn a language well enough that reasonable structures can be assigned to any grammatical (and, yes, even ungrammatical) sentence given to it? It seems to us that if the claims on behalf of P&P approaches are to be taken seriously, it is an obvious requirement that someone provide a computational learner that incorporates P&P mechanisms, and uses it to demonstrate learning of the grammar of a natural language. With this in mind, we offer the following challenge to the community. THE CHALLENGE We challenge someone to produce, by May of 2008, a working P&P parser that can be trained in a supervised fashion on a standard treebank, such as the Penn Treebank, and perform in a range comparable to state-of-the-art statistical parsers. Our selection of May 2008 is motivated by the observation that this is about three years in the future, and a committed graduate student who starts thinking about this problem now, could, by May of 2008 reasonably be expected to produce a Ph.D. dissertation that solves this problem. Let us now lay out what we feel would be the minimal requirements for meeting this challenge. First, the system must use P&P in a non-trivial way. So for example, using a standard machine learning algorithm to extract a statistical parser like those in existence, supplemented by a transducer that maps Penn Tree Bank structures into P&P annotations would not satisfy the challenge. For a system to qualify, it would have to be the case that the P&P component is an integral part of the learning mechanism. Removing the P&P component must seriously degrade performance. Second, the particular choice of parameters and their possible settings, must conform to some recognized version of a P&P theory proposed in the literature. We recognize that it may be necessary to augment the set of accepted parameters with additional conditions that are motivated by particular problems in learning grammatical structures. However, the greater the number of ad hoc principles and parameters that are used, the less adequate the implemented system will be. Third, we recognize that the assumptions about syntactic structure used by the Penn Treebank (and other treebanks) differ in many details from those of the P&P tradition. In particular, P&P work generally assumes a far more articulated syntactic structure than is generally employed in other frameworks. The P&P parser that we are asking for can produce this more articulated structure, without being penalized for such re-analysis of the data. In fact many robust parsers map Penn Tree Bank analyses into alternative formal representations. All that is required is that the designer of the parser also produce code that converts from the P&P parser's syntactic structures back into appropriate treebank structures, so that proper evaluation of its output is possible. It is worth pointing out that our challenge allows the P&P model a considerable, if illicit advantage. We are only asking for supervised grammar induction, when, in fact, unsupervised learning on the basis of parameterized principles would be the more reasonable test of the model's viability. P&P advocates have long eschewed the existence of negative evidence in the learning process, and insisted that grammar induction takes place with very little external data. In allowing supervised learning we have liberalized the conditions of the acquisition problem well beyond the stringent restriction invoked in the P&P literature. POSSIBLE OBJECTIONS We outline here some possible objections to our challenge that might be raised by proponents of P&P approaches to grammar. These objections are anticipated, in part, on the basis of the experience of one of the authors with a prior debate on the scientific foundations of Minimalism. Objection 1: You're confusing performance and competence. Grammars are models of competence, parsers are models of performance. Reply: No we are not. Please note that we have been careful to impose no requirements on algorithms for learning or for the resulting parser. Thus we make no claims and set no expectations on mode of implementation (i.e. performance). The only requirement we insist on is the obvious condition that the learning method and the resulting parser make non- trivial use of the assumptions of the P&P approach to syntax. Again, this is the point of the challenge, given that P&P makes very strong claims about how these grammatical assumptions are essential to language learning. Objection 2: There have been very few people working in P&P parsing. So naturally progress has been slow. Reply: Certainly this is true. The obvious question here is why this has been the case. Positing a theory that makes such far reaching claims about mechanisms underlying language learning would seem to us to commit the adherents of such a theory to the task of demonstrating its viability through implementation of a large scale model. Why, in the past quarter century of work on this topic, has no one attempted this? Objection 3: The MP is a research program, not a fully developed theory. Therefore it can't be expected to yield a model that can be implemented as a broad coverage processing device. Reply: This is a remarkable dodge. The MP has been around at least since the early 1990s. Chomsky sketched this view in ''Some Notes on Economy of Derivation and Representation'' in 1991, and then presented a detailed account in his book ''The Minimalist Program'' in 1995. Much subsequent theoretical work has been done within the MP. The P&P model was explicitly proposed in Lectures on Government and Binding in 1981. The antecedents for this general approach to grammar and language acquisition go back to ''Syntactic Structures'' (1957) and ''Aspects of the Theory of Syntax'' (1964). The P&P view replaced a model of an innate language faculty consisting of a grammar evaluation metric applied to a set of grammars generated by a universal schema of grammar. The idea of a language- specific device/set of constraints as the basis of language learning has, therefore, been at the center of this line of research for close to fifty years. Surely it is long past time to ask for some substantive evidence in the way of a robust grammar induction system that this view of grammar induction is computationally viable. Most other major theoretical models of grammar have succeeded in yielding robust parsers in far less time, despite the fact that they do not, as far we know, make analogous claims about the nature of language learning. Objection 4: It would be trivially easy to devise a procedure to convert Penn treebank structures into MP representations and then use machine learning methods to extract a grammar that generates the latter for Penn Treebank test data. Why bother? Reply: This is exactly the case that we exclude as not satisfying the challenge. The P&P approach makes claims not only about the nature of syntactic representation but the way in which grammar is acquired. Because it purports to be first and foremost a learning theory, it is necessary to show that this model can yield a robust grammar learning device in order for the framework to sustain any credibility. So far, it has not done this. EPILOGUE We will close by noting that we are in no way suggesting that the construction of a trainable wide-coverage P&P-based parser is impossible. Since no one has attempted this, and no one has even sketched a proposal for how to do it, we simply do not know if it is possible. As scientists, we do not speculate about the impossibility of something of which we have no knowledge. In fact, we would be delighted if someone succeeds in meeting our challenge. Such success would convince us that the P&P enterprise is, after all, a testable theory with genuine scientific content. -------------------------------------------------------- Richard Sproat, Department of Linguistics Department of Electrical and Computer Engineering Beckman Institute University of Illinois at Urbana-Champaign Shalom Lappin, Department of Computer Science King's College, London -------------------------------------------------------- REFERENCES Berwick, Robert. 1982. Locality Principles and the Acquisition of Syntactic Knowledge. PhD Dissertation, MIT. Charniak, Eugene. 1997. ''Statistical parsing with a context-free grammar and word statistics'', Proceedings of the Fourteenth National Conference on Artificial Intelligence AAAI Press/MIT Press, Menlo Park. Collins, Michael. 1999. Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania. Fong, Sandiway. 1991. Computational Properties of Principled-Based Grammatical Theories, AI Laboratory, MIT. Fong, Sandiway. 2005. ''Computation with Probes and Goals: A Parsing Perspective.'' In Di Sciullo, A. M. and R. Delmonte (Eds.) UG and External Systems. Amsterdam: John Benjamins. Klein, Dan and Manning, Christopher. 2002. ''Natural Language Grammar Induction using a Constituent-Context Model.'' In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani (eds), Advances in Neural Information Processing Systems 14 (NIPS 2001). Cambridge, MA: MIT Press, vol. 1, pp. 35-42. Klein, Dan and Manning, Christopher. 2004. ''Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency.'' Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). Lasnik, Howard and Uriagereka, Juan. 1988. A Course in GB Syntax: Lectures on Binding and Empty Categories. MIT Press. Roark, Brian and Johnson, Mark. 1999. Efficient probabilistic top-down and left-corner parsing. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 421-428. Riezler, Stefan; Maxwell, John; King, Tracy; Kaplan, Ronald; Crouch, Richard and Johnson, Mark. 2002 ''Parsing the Wall Street Journal using a lexical- functional grammar and discriminative estimation techniques.'' Proceedings 40th Meeting Association for Computational Linguistics, Philadelphia.
Linguistic Field(s):
Computational Linguistics
Discipline of Linguistics
Language Acquisition
Syntax
Text/Corpus Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue
|