|Title:||Re: A Challenge to the Minimalist Community|
|Description:||Again we wish to thank the many people who have responded to our
challenge. It seems to us that most of the recent postings on both
sides of the debate have restated points made in earlier rounds, and
we will not take these up again. Readers can now come to their own
conclusions concerning the issues that have been raised in this
discussion. However, we do feel compelled to respond to several of
the assertions that Charles Yang makes in his posting.
1. First, Yang claims:
"[T]he challenge probably has been met - and many years ago.
Broad coverage parsers based on Government Binding / Minimalism
DO exist. The earliest commercial application I am aware of was Bob
Kuhns' GB parser that was used to summarize newswire stories in
the 1980s, published at the COLING conference in 1990. A more
glaring omission is Dekang Lin's Principles & Parameters based
parsers - unambiguously dubbed PRINCIPAR and MINIPAR
respectively - which have been used in a variety of applications, and
have figured prominently in computational linguistics."
Unfortunately this claim seriously misrepresents the facts at two levels.
Most importantly, our challenge was to produce a TRAINABLE broad-
coverage parser based on Minimalist Program/P&P principles that
could perform at the same level as statistical parsers on a standard
test set. Neither Kuhns' nor Lin's systems include a learning
component. Therefore, for that reason alone they do not meet the
challenge as we had stated it.
But there would still seem to be an issue here since we had claimed in
our original posting:
"While there have been P&P-based parsers (Fong 1991, 2005), and
even attempts to build parsers that learn P&P-style grammars
(Berwick, 1982), all of these systems have one thing in common: they
are small-scale implementations that make no attempt at broad
linguistic coverage. They are not even up to the task of parsing
arbitrary Dr. Seuss books, let alone the ''Wall Street Journal''."
Yang's claim above would thus seem to be most relevant to this
portion of what we said. But actually the facts are far from clear.
Kuhns' parser, (Robert Kuhn, "A PARALOG Implementation of
Government Binding Theory", COLING 1990) describes a parser that
does encode certain GB constraints. But, as we already noted, it
does not perform grammar induction from a corpus nor, crucially, is
any evaluation of the parser against test data offered in the paper.
Lin's PRINCIPAR system (Dekang Lin, "PRINCIPAR--An Efficient,
Broad Coverage, Principle-based Parser", COLING 1994) does
incorporate a number of GB conditions, but again there is no reported
evaluation of the parser on a corpus, though Lin does report on
efficiency issues, and cites some sentences that the system gets
Lin's later system, MINIPAR (Dekang Lin, "Dependency-based
Evaluation of MINIPAR", Workshop on the Evaluation of Parsing
Systems, Grenada, Spain, 1998) was evaluated on a test set of 7,103
sentences from the SUSANNE corpus (a subset of the Brown corpus).
According to these results the parser achieves an average of
(approximately) 88% recall and 78% precision on head-dependency
relations across various categories of text. According to Lin, MINIPAR
is a descendant of PRINCIPAR that has "adopted some of the ideas in
the Minimalist Program such as bare phrase structure and economy
principles." But the actual mechanisms of MINIPAR are not described
in the cited paper so it is unclear to what extent it is in fact a P&P or
MP parser in any meaningful sense.
As a matter of fact, the first author evaluated MINIPAR some years
ago for another project, and still has a copy of the system. The basic
functionality of the parser is only available as a compiled library so it is
not possible to examine those portions. However, many portions of the
system are viewable as plain text, including various lexicons and rules.
An examination of the plain text portions reveals very little if anything
that is specifically P&P or MP in nature, with the vast majority of the
information being information that would be needed in any parser,
such as what categories words belong to or what types of arguments
they expect; there are also some rather specific functions such as
THERE-INSERTION and IT-INSERTION suggesting that not all
aspects of the system's behavior are derived from deep principles as
might be expected from a P&P/MP parser. Also, the parameter
settings that seem to be there also seem to be hard wired for English.
Both of the present authors have also written parsing systems that
incorporate ideas that are found in the P&P or MP literature (though
found in other syntactic theories too). However we would not label our
systems as P&P since their inner mechanics do not derive from
manipulations of parameter settings in the way that, say, Fong's
system clearly does.
However, we will grant Yang that Kuhns' and Lin's work should at
least have received mention in our original posting. No matter: what is
clear is that our challenge is in no sense met by these two systems for
the simple reason that neither of them learns from data.
2. Yang goes on to say:
"In my experience, most linguists don't give a damn about parsing,
or computers, for that matter: they are not paid to develop
technologies that may one day interest Microsoft."
We note that all of the work on machine learning (ML) that we cite in
our challenge was done by university researchers who are interested
in the scientific questions involved in determining the capacities and
limitations of different types of computational learning models. It is
entirely reasonable to disagree with our challenge or the view of
grammar acquisition that we have suggested as a possible alternative
to the P&P model on scientific grounds. We may well be wrong, and
vigorous debate of the relevant issues is to be welcomed. However it
is at best a distraction to impute professional motives to computational
linguists with whom one disagrees. Even if it were true that most NLP
researchers were working in industry, which, as far we know, is not
the case, this fact is not at all relevant to the question under
discussion. We continue to take this question to be whether it is
possible to demonstrate the computational viability of the P&P model
by actually constructing a P&P grammar induction system that equals
the performance of ML systems.
3. Yang then goes on to state:
"On my view, the improvement in parsing quality over the past
decade or so has less to do with breakthroughs in machine learning,
but rather with the enrichment in the representation of syntactic
structures over which statistical induction can take place. The early
1990s parsers using relatively unconstrained stochastic grammars
were disastrous (Charniak 1993). By the mid 90s, notions like head
and lexical selection, both of which are tried and true ideas in
linguistics, had been incorporated in statistical parsers (de
Marcken 1996, Collins 1997). The recent, and remarkable, work of
Klein and Manning (2002) takes this a step further. So far as I can
tell, in the induction of a grammatical constituent, Klein & Manning's
model not only keeps track of the constituent itself, but also its
aunts and sibling(s) in the tree structure. These additional structures
is what they refer to as ''context''; those with a more traditional
linguistics training may recall ''specifier'', ''complement'', ''c-
command'', and ''government''."
Yang seems to be suggesting that the use of increasingly complex
representations in training data is the major factor in the success of
ML methods. This may be partly true in the case of supervised
learning from treebank corpora, although this assumption would not
explain the rapid improvement of supervised ML methods on the same
Wall Street Journal annotated texts over the past ten years.
However, Yang's comment on Klein and Manning's work seems to
involve a serious misunderstanding of unsupervised learning in
general and their systems in particular. Klein and Manning achieve
binary branching constituent structure and head dependency
structure recognition on the basis of Penn treebank input annotated
only with part of speech information. None of the parse structure in
these corpora or the syntactic relations that Yang mentions are part of
the input to their learning algorithms. Moreover, in several of their
experiments the POS tagging is itself contributed by a previous cycle
of unsupervised learning.
Following on the previously quoted passage, Yang states:
If this interpretation is correct, then the rapid progress in statistical
parsing offers converging evidence that the principles and
constraints linguists have discovered are right on the mark, and if
one wishes, can be put into use for practical purposes.
This comment represents an interesting shift of ground. It is true that
linguistic insights such as "head" and "lexical selection" have made it
into the statistical parsing literature. And these are indeed insights that
have been productively used in theoretical linguistics. But since when
were these concepts the unique provenance of P&P approaches to
syntax? On the contrary, notions like "head", "lexical selection", and
perhaps X-bar theory, seem to be among the few things that
theoretical syntacticians of any persuasion can all agree on. So
contrary to Yang's implication, the fact that statistical parsing work has
picked up on these linguistic ideas offers no solace to P&P/MP
4. Finally, Yang ends with the following plea:
"This, then, would seem to be a time to rejoice and play together,
rather than driving a wedge of ''challenge'' between the two
Needless to say, we have no objection to the notion that we should all
play together. Unfortunately though, a great many responses to our
challenge from the P&P/MP community have involved various attempts
to argue that our challenge is irrelevant to the goals of that version of
theoretical syntax. This does not suggest to us that there is any
widespread desire to play together. If our challenge was a "wedge"
then it must have fallen to the ground already, since the gap we drove
it into was already far too large.
Discipline of Linguistics