|Title:||A Challenge to the Minimalist Community|
Over the past fifteen years there has been significant progress in the field
of statistical parsing. Much of the work has focussed on supervised
methods, where by ''supervised'' we mean that the training data consists of
sentences and their associated syntactic trees (for example, Charniak
1997, Collins 1999, Roark and Johnson 1999). There are a number of
treebank corpora, of which the Penn Treebank, based largely on ''Wall
Street Journal'' text, and available from the Linguistic Data Consortium, is
the most widely used. As long as one stays within genre (e.g. newspaper
text), it is now possible to train such parsers to perform on held-out test
data with a weighted F-score accuracy of between 80% and 90%, measured
by comparing the trees that the system produces with the hand-
constructed trees for the test data. More impressively, there has been
recent work by Dan Klein and Chris Manning on machine learning
procedures (Klein & Manning 2002, Klein & Manning 2004) that infer
parsers in an unsupervised fashion from raw text annotated with part-of-
speech tags. These systems attempt to do what a child does when it learns
his/her language. They infer structure from distributional patterns in a
more or less unannotated sequence of tokens. The Klein-Manning
procedures do not yet perform as well on held out test data as the
supervised systems, but they are quickly converging on supervised results.
What is particularly notable about the Klein-Manning grammar induction
procedures is that they do what Chomsky and others have argued is
impossible: They induce a grammar using general statistical methods which
have few, if any, built-in assumptions that are specific to language.
There has also been significant progress in the realm of hand-built parsers
that adopt certain grammatical frameworks. For example, the PARC LFG
parser, developed by Ron Kaplan and colleagues over the past twenty years,
with intensive manual labor, performs at a level comparable to current
statistical parsers on Penn Treebank data (Riezler et al. 2002).
Surprisingly, one approach to grammar that is not represented in work on
robust parsing is the Principles and Parameters model, or what has evolved
over the last ten years into the ''Minimalist Program'' (MP). At the risk of
being accused of nomenclatural solecism, we will simply refer to it
as ''Principles and Parameters'' (P&P). While there have been P&P-based
parsers (Fong 1991, 2005), and even attempts to build parsers that learn
P&P-style grammars (Berwick, 1982), all of these systems have one thing in
common: they are small-scale implementations that make no attempt at
broad linguistic coverage. They are not even up to the task of parsing
arbitrary Dr. Seuss books, let alone the ''Wall Street Journal''. For example,
Fong's English parser, which is arguably the best piece of work of this kind,
handles, in its current incarnation, just the example sentences in Lasnik
and Uriagereka's textbook (1988). Crucially, there has been, to our
knowledge, no interest in developing a broad-coverage P&P parser.
Even more puzzling is the lack of any serious attempt to build a P&P-style
parser that is able to learn from unannotated input (as Klein and Manning's
systems do). What is odd about this is that it is practically de rigueur when
one writes a paper in the P&P framework, to invoke the old arguments
about ''poverty of stimulus'' and how the only feasible explanation for the
way in which children acquire language is by having a substantial portion of
the grammatical knowledge already hard-wired in the form of grammatical
principles and parameters. Why, if this is the case, has nobody tested this
claim by building a computational model that is able to do what every child
is able to do: namely learn a language well enough that reasonable
structures can be assigned to any grammatical (and, yes, even
ungrammatical) sentence given to it? It seems to us that if the claims on
behalf of P&P approaches are to be taken seriously, it is an obvious
requirement that someone provide a computational learner that
incorporates P&P mechanisms, and uses it to demonstrate learning of the
grammar of a natural language.
With this in mind, we offer the following challenge to the community.
We challenge someone to produce, by May of 2008, a working P&P parser
that can be trained in a supervised fashion on a standard treebank, such as
the Penn Treebank, and perform in a range comparable to state-of-the-art
Our selection of May 2008 is motivated by the observation that this is about
three years in the future, and a committed graduate student who starts
thinking about this problem now, could, by May of 2008 reasonably be
expected to produce a Ph.D. dissertation that solves this problem.
Let us now lay out what we feel would be the minimal requirements for
meeting this challenge.
First, the system must use P&P in a non-trivial way. So for example, using a
standard machine learning algorithm to extract a statistical parser like
those in existence, supplemented by a transducer that maps Penn Tree
Bank structures into P&P annotations would not satisfy the challenge. For a
system to qualify, it would have to be the case that the P&P component is an
integral part of the learning mechanism. Removing the P&P component
must seriously degrade performance.
Second, the particular choice of parameters and their possible settings,
must conform to some recognized version of a P&P theory proposed in the
literature. We recognize that it may be necessary to augment the set of
accepted parameters with additional conditions that are motivated by
particular problems in learning grammatical structures. However, the
greater the number of ad hoc principles and parameters that are used, the
less adequate the implemented system will be.
Third, we recognize that the assumptions about syntactic structure used by
the Penn Treebank (and other treebanks) differ in many details from those
of the P&P tradition. In particular, P&P work generally assumes a far more
articulated syntactic structure than is generally employed in other
frameworks. The P&P parser that we are asking for can produce this more
articulated structure, without being penalized for such re-analysis of the
data. In fact many robust parsers map Penn Tree Bank analyses into
alternative formal representations. All that is required is that the designer
of the parser also produce code that converts from the P&P parser's
syntactic structures back into appropriate treebank structures, so that
proper evaluation of its output is possible.
It is worth pointing out that our challenge allows the P&P model a
considerable, if illicit advantage. We are only asking for supervised
grammar induction, when, in fact, unsupervised learning on the basis of
parameterized principles would be the more reasonable test of the model's
viability. P&P advocates have long eschewed the existence of negative
evidence in the learning process, and insisted that grammar induction takes
place with very little external data. In allowing supervised learning we have
liberalized the conditions of the acquisition problem well beyond the
stringent restriction invoked in the P&P literature.
We outline here some possible objections to our challenge that might be
raised by proponents of P&P approaches to grammar. These objections are
anticipated, in part, on the basis of the experience of one of the authors
with a prior debate on the scientific foundations of Minimalism.
Objection 1: You're confusing performance and competence. Grammars are
models of competence, parsers are models of performance.
Reply: No we are not. Please note that we have been careful to impose no
requirements on algorithms for learning or for the resulting parser. Thus
we make no claims and set no expectations on mode of implementation
(i.e. performance). The only requirement we insist on is the obvious
condition that the learning method and the resulting parser make non-
trivial use of the assumptions of the P&P approach to syntax. Again, this is
the point of the challenge, given that P&P makes very strong claims about
how these grammatical assumptions are essential to language learning.
Objection 2: There have been very few people working in P&P parsing. So
naturally progress has been slow.
Reply: Certainly this is true. The obvious question here is why this has been
the case. Positing a theory that makes such far reaching claims about
mechanisms underlying language learning would seem to us to commit the
adherents of such a theory to the task of demonstrating its viability through
implementation of a large scale model. Why, in the past quarter century of
work on this topic, has no one attempted this?
Objection 3: The MP is a research program, not a fully developed theory.
Therefore it can't be expected to yield a model that can be implemented as
a broad coverage processing device.
Reply: This is a remarkable dodge. The MP has been around at least since
the early 1990s. Chomsky sketched this view in ''Some Notes on Economy of
Derivation and Representation'' in 1991, and then presented a detailed
account in his book ''The Minimalist Program'' in 1995. Much subsequent
theoretical work has been done within the MP. The P&P model was explicitly
proposed in Lectures on Government and Binding in 1981. The antecedents
for this general approach to grammar and language acquisition go back
to ''Syntactic Structures'' (1957) and ''Aspects of the Theory of Syntax''
(1964). The P&P view replaced a model of an innate language faculty
consisting of a grammar evaluation metric applied to a set of grammars
generated by a universal schema of grammar. The idea of a language-
specific device/set of constraints as the basis of language learning has,
therefore, been at the center of this line of research for close to fifty years.
Surely it is long past time to ask for some substantive evidence in the way of
a robust grammar induction system that this view of grammar induction is
computationally viable. Most other major theoretical models of grammar
have succeeded in yielding robust parsers in far less time, despite the fact
that they do not, as far we know, make analogous claims about the nature
of language learning.
Objection 4: It would be trivially easy to devise a procedure to convert Penn
treebank structures into MP representations and then use machine learning
methods to extract a grammar that generates the latter for Penn Treebank
test data. Why bother?
Reply: This is exactly the case that we exclude as not satisfying the
challenge. The P&P approach makes claims not only about the nature of
syntactic representation but the way in which grammar is acquired. Because
it purports to be first and foremost a learning theory, it is necessary to
show that this model can yield a robust grammar learning device in order
for the framework to sustain any credibility. So far, it has not done this.
We will close by noting that we are in no way suggesting that the
construction of a trainable wide-coverage P&P-based parser is impossible.
Since no one has attempted this, and no one has even sketched a proposal
for how to do it, we simply do not know if it is possible. As scientists, we do
not speculate about the impossibility of something of which we have no
In fact, we would be delighted if someone succeeds in meeting our
challenge. Such success would convince us that the P&P enterprise is, after
all, a testable theory with genuine scientific content.
Department of Linguistics
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
Department of Computer Science
King's College, London
Berwick, Robert. 1982. Locality Principles and the Acquisition of Syntactic
Knowledge. PhD Dissertation, MIT.
Charniak, Eugene. 1997. ''Statistical parsing with a context-free grammar
and word statistics'', Proceedings of the Fourteenth National Conference on
Artificial Intelligence AAAI Press/MIT Press, Menlo Park.
Collins, Michael. 1999. Head-Driven Statistical Models for Natural
Language Parsing. PhD Dissertation, University of Pennsylvania.
Fong, Sandiway. 1991. Computational Properties of Principled-Based
Grammatical Theories, AI Laboratory, MIT.
Fong, Sandiway. 2005. ''Computation with Probes and Goals: A Parsing
Perspective.'' In Di Sciullo, A. M. and R. Delmonte (Eds.) UG and External
Systems. Amsterdam: John Benjamins.
Klein, Dan and Manning, Christopher. 2002. ''Natural Language Grammar
Induction using a Constituent-Context Model.'' In Thomas G. Dietterich,
Suzanna Becker, and Zoubin Ghahramani (eds), Advances in Neural
Information Processing Systems 14 (NIPS 2001). Cambridge, MA: MIT Press,
vol. 1, pp. 35-42.
Klein, Dan and Manning, Christopher. 2004. ''Corpus-Based Induction of
Syntactic Structure: Models of Dependency and Constituency.'' Proceedings
of the 42nd Annual Meeting of the Association for Computational
Linguistics (ACL 2004).
Lasnik, Howard and Uriagereka, Juan. 1988. A Course in GB Syntax:
Lectures on Binding and Empty Categories. MIT Press.
Roark, Brian and Johnson, Mark. 1999. Efficient probabilistic top-down and
left-corner parsing. In Proceedings of the 37th Annual Meeting of the
Association for Computational Linguistics, pages 421-428.
Riezler, Stefan; Maxwell, John; King, Tracy; Kaplan, Ronald; Crouch, Richard
and Johnson, Mark. 2002 ''Parsing the Wall Street Journal using a lexical-
functional grammar and discriminative estimation techniques.''
Proceedings 40th Meeting Association for Computational Linguistics,
Discipline of Linguistics