* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 16.1439

Thu May 05 2005

Disc: Re: A Challenge to the Minimalist Community

Editor for this issue: Michael Appleby <michaellinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Carson Schütze, Re: A Challenge to the Minimalist Community
        2.    Richard Sproat, Re: A Challenge to the Minimalist Community


Message 1: Re: A Challenge to the Minimalist Community
Date: 05-May-2005
From: Carson Schütze <cschutzeucla.edu>
Subject: Re: A Challenge to the Minimalist Community


Ash Asudeh [LL 16.1364] said

> Some confusion has arisen in the subsequent discussion of the
> Sproat-Lappin challenge. Most of the subsequent posts discuss
> statistical parsing versus P&P parsing. However, the challenge has
> nothing to do with statistical parsers per se

I would like to do some clarifying of my own. Let's see how the
challenge was worded: [LL 16.1156]

> We challenge someone to produce, by May of 2008, a working P&P
> parser that can be trained in a supervised fashion on a standard
> treebank, such as the Penn Treebank, and perform in a range
> comparable to state-of-the-art statistical parsers.

So, statistical parsers were relevant as an existence proof that the
assigned task is doable using current technology. If there were no
systems that could parse the Treebank 90% correctly (or whatever the
standard is), then asking P&P to do so would be a very different kind
of challenge; Sproat and Lappin frame their challenge thus: other
approaches have reached this milestone, we challenge you to catch
up. From that perspective it is entirely relevant what the capabilities
and design goals of those approaches are, compared to those of P&P.
[Of course it is true that one could challenge P&P to do as well as
some nonstatistical parser, in which case that would be the system
whose capabilities/goals would be relevant. In fact one could invent a
new challenge by simply omitting the word "statistical" from the
original, but (a) Sproat and Lappin explicitly included it; (b) I think that
would make it harder to establish a metric for state-of-the-art-hood,
because it would involve apples-and-oranges comparisons, but I'm
sure others will disagree here.]

Then Ash says about my previous point on ungrammaticality

> I don't understand the substance of this objection. All grammars,
> those used in statistical parsing or otherwise, attempt to reject
> ungrammatical sentences: Nobody wants their grammar/parser to
> overgenerate. Even if the claim is true of statistical parsers (I don't
> think it is), it certainly isn't true of the LFG and HPSG parsers and
> grammars noted above.

Let me elaborate on John Goldsmith's [LL16.1432] defense of my
point. Ash makes a claim about all grammars and about generation,
but the challenge doesn't require the statistical parser to have a
grammar or to generate in the relevant sense, it just requires it to map
well-formed input strings to the "right" trees. If a grammar is defined,
as Ash seems to assume and most would agree, as something that
delineates all and only the well-formed expressions of a language,
then the benchmark systems are certainly not in principle required to
have one. If they provide an output for every possible input string, with
no systematic distinction between the good and the bad, then by this
definition they don't have a grammar, at most just half of one. Even if
they did, the challenge contains nothing that would assess the set of
strings that the grammar rules out, which is why I proposed a second
part of the challenge to do so.

I think Ash and I agree that any "interesting" model (I won't try to
define "interesting", but we know what we mean :-) of human
language will include constraints against overgeneration; in those
terms, my point was that the challenge does not require the
benchmark system to be interesting. [Of course once again one could
invent a different challenge that pits a P&P parser against an HPSG
parser, where the simplest form of my objection would go away: the
benchmark system wouldn't be ignoring an entire ability that the P&P
system is designed to model. I still think it would be interesting to test
in detail whether the two systems rule out the same strings, and
whether those strings are indeed all and only the ungrammatical
strings of the language.]

So I think we agree on the overall point that comparing a P&P parser
to a parser that is committed (in the ways S&L outline) to the claims of
some other linguistic theory would be more meaningful than a
comparison with purely statistical parsers. But for those who disagree I
would still submit that a comparison with a statistical parser would be
more meaningful if it included a comparison of '(un)grammaticality
judgments'.

I do want to clarify something else John Goldsmith said, however:

> There is not universal agreement to the position that the ability to
> distinguish grammatical from ungrammatical sentences is an
> important function to be able to model directly, whether we are
> looking at humans or at software. There are certainly various
> serious parsing systems whose goal is to be able to parse, as best
> they can, any linguistic material that is given to them -- and
> arguably, that is what we speakers do too.

This comment unfortunately conflates two notions that I was at pains
to keep separate in my original posting. One is the idea that a system
will produce *some* parse for every input string you give it, including
the ungrammatical ones, rather than *just* returning "FAIL". The other
is the idea that a system will flag all ungrammatical inputs as
ungrammatical, whatever else it might do with them. The first may or
may not be an ability that humans have in full generality, and
depending on how you think they achieve it when they do, you may or
may not want to model it within your parser. But the second is
something humans unquestionably *can* do for at least the massively
vast majority of possible strings, and I therefore submit that any
system that purports to be a model of human language ability should
be required to do the same.

My original claim, once again, was that the challenge makes no
requirement on this second point, but that it would be much more
sensible if it did. Of course it also makes no requirement on the first
point, but I did not propose expanding the challenge to incorporate it,
for two reasons. One, which I think was John Goldsmith's main point,
is that there is much less consensus on this as a desideratum of
models of human parsing. The second is that there is almost no
empirical data against which we could test statistical, P&P, HPSG or
any other parsers with regard to how they ought to "interpret"
ungrammatical strings. I know some people can supply some
references, but their scope is extremely limited. If we consider one of
the dumbest ways of generating a test corpus of ungrammatical
sentences, namely by fully reversing the sequence of words in each of
the Treebank sentences, I don't think anyone has a clue how people
would interpret them (if at all).

Finally, on the general relevance of the full set of goals/capabilities of
theories, Ash says:

> The substance of the objections are that P&P is attempting to do
> much more than just parse sentences (Hallman) and that the goals
> of P&P are different to those of computational linguistics (McGinnis).
> I think there is merit to both these statements, but they are ultimately
> non sequiturs to the challenge. ... The requirement of capturing the
> adult grammar also means that it's insubstantial whether the goals of
> P&P are those of computational linguistics: P&P is still expected to
> capture adult grammatical competence in the end, even if this isn't a
> *motivation* for a lot of its practitioners.

Consider the following analogy. You and I both are given the task of
designing a motor vehicle that will get someone from point A to point
B. You come back with a Corvette, I come back with an SUV. Now you
say, "Let's go to a racetrack, I'll bet I can drive a circuit faster than
you, which means I have the better design." I will of course object:
speed was not specified as the desideratum of the vehicle. Both
vehicles can get a person from A to B. Moreover, the SUV can do lots
of things the 'vette can't: carry more than 2 people, hold lots of
luggage, play DVDs for the back seat passengers, transport moderate-
sized pieces of furniture, host a small business meeting, etc. My
motivation in designing it was to make it a multi-purpose family vehicle.
If I were now to go back to the drafting table and modify my SUV
design so that it keeps all its current features but can also go as fast
as a Corvette, surely I will have achieved a much more difficult task
than the person who just designed the Corvette.

I could have worked harder to make the analogy tighter, but the basic
point would still go through.

Carson

--

Prof. Carson T. Schütze Department of Linguistics, UCLA
Web: http://www.linguistics.ucla.edu/people/cschutze


Linguistic Field(s): Computational Linguistics
Discipline of Linguistics
Message 2: Re: A Challenge to the Minimalist Community
Date: 05-May-2005
From: Richard Sproat <rwsxoba.com>
Subject: Re: A Challenge to the Minimalist Community


We thank the people who have responded to our challenge posted in
16.1156, both in private and on the List. A number of the responses
(mostly those offered in private) have been supportive. Others have
raised issues with our challenge. In the interests of brevity, we will
respond to the main objections rather than to individual comments:

1. It is too early to expect P&P to provide a theory that can be
implemented as part of a large-scale parsing system that learns from
data.

RESPONSE: This was our "Objection 3", which we characterized as
a "remarkable dodge". Need we say more?

2. The challenge is the wrong challenge, either because:

A. We rely on the Penn Treebank as our gold standard, whereas
there is no reason to accept the validity of the Penn Treebank
structures; they are not even theoretically interesting.

B. Providing valid structures for sentences is not the only goal or even
the most reasonable goal of syntactic theory: a syntactic theory should
also provide grammaticality judgments for sentences; a syntactic
theory should explain cross-linguistic variation.

C. Statistical approaches have it too easy since they are trained on
data that is similar in genre to the test data.

RESPONSE: If you do not like the Penn Treebank, you are free to use
any other reasonable corpus, and to provide your own annotations
and representations. The task remains the same. Show that a P&P
acquisition system can do at least as well as statistical approaches.

Regarding B, we remind readers that humans do assign structure to
sentences, that assigning structure to sentences is surely a part of
what syntax is about, that humans acquire this knowledge as part of
language acquisition, and that P&P claims to provide an explanation of
how this is achieved. So we are at a loss to understand why inducing
a large-scale working parser from sample data is not a valid test of
P&P.

The claim that statistical approaches have it "too easy" will have some
content when it is accompanied by an implemented P&P device that
matches the performance of machine learning systems. If such a
device cannot be constructed, it suggests not that statistical systems
have it too easy (the same conditions have always been on offer to
those interested in developing a large coverage P&P parser), but that
the P&P framework is not computationally viable as a model for
language acquisition.

3. The challenge could certainly in principle be met by P&P.

RESPONSE: "In principle" doesn't count here. Only "in fact" has any
credibility.

4. The challenge is already being met.

RESPONSE: Oh really, where? We look forward to seeing convincing
evidence of this.

5. Computational linguistics is about engineering rather than science.
It may be useful for us scientists to be more aware of what is going on
in engineering, and similarly the engineers could gain some insights
from us scientists.

RESPONSE: It is true that computational linguistics often has
engineering applications and that these applications often motivate
computational linguists to address certain problems. But let's not
confuse the issue. Many computational linguists, the two present
authors included, are fully trained linguists who happen to be
interested in how computational methods can yield insights on
language. If this is not science, we do not know what is.

6. Machine learning cannot produce constraints that rule out
ungrammatical sentences. Where the P&P seeks to characterize the
set of possible natural languages, ML just learns syntactic patterns
exhibited in a particular corpus.

RESPONSE: Machine learning has achieved induction of robust
grammars that can, in fact, be turned into classifiers able to distinguish
between acceptable and ill formed structures over large linguistic
domains. The fact that after more than half a century of sustained
research the P&P enterprise and its antecedents have failed to
produce a single broad coverage computational system for grammar
learning suggests that its notion of Universal Grammar encoded in a
language faculty may well be misconceived. The increasing success of
unsupervised ML techniques in grammar acquisition lends at least
initial plausability to the proposal that general learning and induction
mechanisms, together with minimal assumptions concerning basic
linguistic categories and rule hypothesis search spaces are sufficient
to account for much (perhaps all) of the language acquisition task.

7. You should have offered a monetary prize as a financial incentive
for meeting the challenge.

RESPONSE: We don't see why we need to pay people extra for
demonstrating the viability of a "research program" which has
dominated much of the field for decades, but has yet to produce
anything approaching the results that its rivals have achieved
efficiently in a relatively short period of time.

Finally since our challenge has actually stimulated relatively little
discussion from the P&P community, we suspect the following may
also be one response:

8. Ignore the challenge because it's irrelevant to the theory and
therefore not interesting.

RESPONSE: This is the "answer" we had most anticipated. It does not
bode well for a field when serious scientific issues are dismissed and
dealt with through silence.

Richard Sproat
Shalom Lappin


Linguistic Field(s): Computational Linguistics
Discipline of Linguistics
Syntax


Respond to list|Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.