Date: 12-May-2005
From: Richard Sproat <rws xoba.com>
Subject: Re: A Challenge to the Minimalist Community
Again we wish to thank the many people who have responded to our challenge. It seems to us that most of the recent postings on both sides of the debate have restated points made in earlier rounds, and we will not take these up again. Readers can now come to their own conclusions concerning the issues that have been raised in this discussion. However, we do feel compelled to respond to several of the assertions that Charles Yang makes in his posting. 1. First, Yang claims: "[T]he challenge probably has been met - and many years ago. Broad coverage parsers based on Government Binding / Minimalism DO exist. The earliest commercial application I am aware of was Bob Kuhns' GB parser that was used to summarize newswire stories in the 1980s, published at the COLING conference in 1990. A more glaring omission is Dekang Lin's Principles & Parameters based parsers - unambiguously dubbed PRINCIPAR and MINIPAR respectively - which have been used in a variety of applications, and have figured prominently in computational linguistics." Unfortunately this claim seriously misrepresents the facts at two levels. Most importantly, our challenge was to produce a TRAINABLE broad- coverage parser based on Minimalist Program/P&P principles that could perform at the same level as statistical parsers on a standard test set. Neither Kuhns' nor Lin's systems include a learning component. Therefore, for that reason alone they do not meet the challenge as we had stated it. But there would still seem to be an issue here since we had claimed in our original posting: "While there have been P&P-based parsers (Fong 1991, 2005), and even attempts to build parsers that learn P&P-style grammars (Berwick, 1982), all of these systems have one thing in common: they are small-scale implementations that make no attempt at broad linguistic coverage. They are not even up to the task of parsing arbitrary Dr. Seuss books, let alone the ''Wall Street Journal''." Yang's claim above would thus seem to be most relevant to this portion of what we said. But actually the facts are far from clear. Kuhns' parser, (Robert Kuhn, "A PARALOG Implementation of Government Binding Theory", COLING 1990) describes a parser that does encode certain GB constraints. But, as we already noted, it does not perform grammar induction from a corpus nor, crucially, is any evaluation of the parser against test data offered in the paper. Lin's PRINCIPAR system (Dekang Lin, "PRINCIPAR--An Efficient, Broad Coverage, Principle-based Parser", COLING 1994) does incorporate a number of GB conditions, but again there is no reported evaluation of the parser on a corpus, though Lin does report on efficiency issues, and cites some sentences that the system gets correct. Lin's later system, MINIPAR (Dekang Lin, "Dependency-based Evaluation of MINIPAR", Workshop on the Evaluation of Parsing Systems, Grenada, Spain, 1998) was evaluated on a test set of 7,103 sentences from the SUSANNE corpus (a subset of the Brown corpus). According to these results the parser achieves an average of (approximately) 88% recall and 78% precision on head-dependency relations across various categories of text. According to Lin, MINIPAR is a descendant of PRINCIPAR that has "adopted some of the ideas in the Minimalist Program such as bare phrase structure and economy principles." But the actual mechanisms of MINIPAR are not described in the cited paper so it is unclear to what extent it is in fact a P&P or MP parser in any meaningful sense. As a matter of fact, the first author evaluated MINIPAR some years ago for another project, and still has a copy of the system. The basic functionality of the parser is only available as a compiled library so it is not possible to examine those portions. However, many portions of the system are viewable as plain text, including various lexicons and rules. An examination of the plain text portions reveals very little if anything that is specifically P&P or MP in nature, with the vast majority of the information being information that would be needed in any parser, such as what categories words belong to or what types of arguments they expect; there are also some rather specific functions such as THERE-INSERTION and IT-INSERTION suggesting that not all aspects of the system's behavior are derived from deep principles as might be expected from a P&P/MP parser. Also, the parameter settings that seem to be there also seem to be hard wired for English. Both of the present authors have also written parsing systems that incorporate ideas that are found in the P&P or MP literature (though found in other syntactic theories too). However we would not label our systems as P&P since their inner mechanics do not derive from manipulations of parameter settings in the way that, say, Fong's system clearly does. However, we will grant Yang that Kuhns' and Lin's work should at least have received mention in our original posting. No matter: what is clear is that our challenge is in no sense met by these two systems for the simple reason that neither of them learns from data. 2. Yang goes on to say: "In my experience, most linguists don't give a damn about parsing, or computers, for that matter: they are not paid to develop technologies that may one day interest Microsoft." We note that all of the work on machine learning (ML) that we cite in our challenge was done by university researchers who are interested in the scientific questions involved in determining the capacities and limitations of different types of computational learning models. It is entirely reasonable to disagree with our challenge or the view of grammar acquisition that we have suggested as a possible alternative to the P&P model on scientific grounds. We may well be wrong, and vigorous debate of the relevant issues is to be welcomed. However it is at best a distraction to impute professional motives to computational linguists with whom one disagrees. Even if it were true that most NLP researchers were working in industry, which, as far we know, is not the case, this fact is not at all relevant to the question under discussion. We continue to take this question to be whether it is possible to demonstrate the computational viability of the P&P model by actually constructing a P&P grammar induction system that equals the performance of ML systems. 3. Yang then goes on to state: "On my view, the improvement in parsing quality over the past decade or so has less to do with breakthroughs in machine learning, but rather with the enrichment in the representation of syntactic structures over which statistical induction can take place. The early 1990s parsers using relatively unconstrained stochastic grammars were disastrous (Charniak 1993). By the mid 90s, notions like head and lexical selection, both of which are tried and true ideas in linguistics, had been incorporated in statistical parsers (de Marcken 1996, Collins 1997). The recent, and remarkable, work of Klein and Manning (2002) takes this a step further. So far as I can tell, in the induction of a grammatical constituent, Klein & Manning's model not only keeps track of the constituent itself, but also its aunts and sibling(s) in the tree structure. These additional structures is what they refer to as ''context''; those with a more traditional linguistics training may recall ''specifier'', ''complement'', ''c- command'', and ''government''." Yang seems to be suggesting that the use of increasingly complex representations in training data is the major factor in the success of ML methods. This may be partly true in the case of supervised learning from treebank corpora, although this assumption would not explain the rapid improvement of supervised ML methods on the same Wall Street Journal annotated texts over the past ten years. However, Yang's comment on Klein and Manning's work seems to involve a serious misunderstanding of unsupervised learning in general and their systems in particular. Klein and Manning achieve binary branching constituent structure and head dependency structure recognition on the basis of Penn treebank input annotated only with part of speech information. None of the parse structure in these corpora or the syntactic relations that Yang mentions are part of the input to their learning algorithms. Moreover, in several of their experiments the POS tagging is itself contributed by a previous cycle of unsupervised learning. Following on the previously quoted passage, Yang states: If this interpretation is correct, then the rapid progress in statistical parsing offers converging evidence that the principles and constraints linguists have discovered are right on the mark, and if one wishes, can be put into use for practical purposes. This comment represents an interesting shift of ground. It is true that linguistic insights such as "head" and "lexical selection" have made it into the statistical parsing literature. And these are indeed insights that have been productively used in theoretical linguistics. But since when were these concepts the unique provenance of P&P approaches to syntax? On the contrary, notions like "head", "lexical selection", and perhaps X-bar theory, seem to be among the few things that theoretical syntacticians of any persuasion can all agree on. So contrary to Yang's implication, the fact that statistical parsing work has picked up on these linguistic ideas offers no solace to P&P/MP grammarians. 4. Finally, Yang ends with the following plea: "This, then, would seem to be a time to rejoice and play together, rather than driving a wedge of ''challenge'' between the two communities." Needless to say, we have no objection to the notion that we should all play together. Unfortunately though, a great many responses to our challenge from the P&P/MP community have involved various attempts to argue that our challenge is irrelevant to the goals of that version of theoretical syntax. This does not suggest to us that there is any widespread desire to play together. If our challenge was a "wedge" then it must have fallen to the ground already, since the gap we drove it into was already far too large. Richard Sproat Shalom Lappin
Linguistic Field(s):
Computational Linguistics
Discipline of Linguistics
Linguistic Theories
Respond to list|Read more issues|LINGUIST home page|Top of issue
|