Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more

Donate Now | Visit the Fund Drive Homepage

Amount Raised:


Still Needed:


Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Discussion Details

Title: Re: A Challenge to the Minimalist Community
Submitter: Carson Schütze
Description: Ash Asudeh [LL 16.1364] said

  Some confusion has arisen in the subsequent discussion of the
  Sproat-Lappin challenge. Most of the subsequent posts discuss
  statistical parsing versus P&P parsing. However, the challenge has
  nothing to do with statistical parsers per se

I would like to do some clarifying of my own. Let's see how the
challenge was worded: [LL 16.1156]

  We challenge someone to produce, by May of 2008, a working P&P
  parser that can be trained in a supervised fashion on a standard
  treebank, such as the Penn Treebank, and perform in a range
  comparable to state-of-the-art statistical parsers.

So, statistical parsers were relevant as an existence proof that the
assigned task is doable using current technology. If there were no
systems that could parse the Treebank 90% correctly (or whatever the
standard is), then asking P&P to do so would be a very different kind
of challenge; Sproat and Lappin frame their challenge thus: other
approaches have reached this milestone, we challenge you to catch
up. From that perspective it is entirely relevant what the capabilities
and design goals of those approaches are, compared to those of P&P.
[Of course it is true that one could challenge P&P to do as well as
some nonstatistical parser, in which case that would be the system
whose capabilities/goals would be relevant. In fact one could invent a
new challenge by simply omitting the word "statistical" from the
original, but (a) Sproat and Lappin explicitly included it; (b) I think that
would make it harder to establish a metric for state-of-the-art-hood,
because it would involve apples-and-oranges comparisons, but I'm
sure others will disagree here.]

Then Ash says about my previous point on ungrammaticality

  I don't understand the substance of this objection. All grammars,
  those used in statistical parsing or otherwise, attempt to reject
  ungrammatical sentences: Nobody wants their grammar/parser to
  overgenerate. Even if the claim is true of statistical parsers (I don't
  think it is), it certainly isn't true of the LFG and HPSG parsers and
  grammars noted above.

Let me elaborate on John Goldsmith's [LL16.1432] defense of my
point. Ash makes a claim about all grammars and about generation,
but the challenge doesn't require the statistical parser to have a
grammar or to generate in the relevant sense, it just requires it to map
well-formed input strings to the "right" trees. If a grammar is defined,
as Ash seems to assume and most would agree, as something that
delineates all and only the well-formed expressions of a language,
then the benchmark systems are certainly not in principle required to
have one. If they provide an output for every possible input string, with
no systematic distinction between the good and the bad, then by this
definition they don't have a grammar, at most just half of one. Even if
they did, the challenge contains nothing that would assess the set of
strings that the grammar rules out, which is why I proposed a second
part of the challenge to do so.

I think Ash and I agree that any "interesting" model (I won't try to
define "interesting", but we know what we mean :-) of human
language will include constraints against overgeneration; in those
terms, my point was that the challenge does not require the
benchmark system to be interesting. [Of course once again one could
invent a different challenge that pits a P&P parser against an HPSG
parser, where the simplest form of my objection would go away: the
benchmark system wouldn't be ignoring an entire ability that the P&P
system is designed to model. I still think it would be interesting to test
in detail whether the two systems rule out the same strings, and
whether those strings are indeed all and only the ungrammatical
strings of the language.]

So I think we agree on the overall point that comparing a P&P parser
to a parser that is committed (in the ways S&L outline) to the claims of
some other linguistic theory would be more meaningful than a
comparison with purely statistical parsers. But for those who disagree I
would still submit that a comparison with a statistical parser would be
more meaningful if it included a comparison of '(un)grammaticality

I do want to clarify something else John Goldsmith said, however:

  There is not universal agreement to the position that the ability to
  distinguish grammatical from ungrammatical sentences is an
  important function to be able to model directly, whether we are
  looking at humans or at software. There are certainly various
  serious parsing systems whose goal is to be able to parse, as best
  they can, any linguistic material that is given to them -- and
  arguably, that is what we speakers do too.

This comment unfortunately conflates two notions that I was at pains
to keep separate in my original posting. One is the idea that a system
will produce *some* parse for every input string you give it, including
the ungrammatical ones, rather than *just* returning "FAIL". The other
is the idea that a system will flag all ungrammatical inputs as
ungrammatical, whatever else it might do with them. The first may or
may not be an ability that humans have in full generality, and
depending on how you think they achieve it when they do, you may or
may not want to model it within your parser. But the second is
something humans unquestionably *can* do for at least the massively
vast majority of possible strings, and I therefore submit that any
system that purports to be a model of human language ability should
be required to do the same.

My original claim, once again, was that the challenge makes no
requirement on this second point, but that it would be much more
sensible if it did. Of course it also makes no requirement on the first
point, but I did not propose expanding the challenge to incorporate it,
for two reasons. One, which I think was John Goldsmith's main point,
is that there is much less consensus on this as a desideratum of
models of human parsing. The second is that there is almost no
empirical data against which we could test statistical, P&P, HPSG or
any other parsers with regard to how they ought to "interpret"
ungrammatical strings. I know some people can supply some
references, but their scope is extremely limited. If we consider one of
the dumbest ways of generating a test corpus of ungrammatical
sentences, namely by fully reversing the sequence of words in each of
the Treebank sentences, I don't think anyone has a clue how people
would interpret them (if at all).

Finally, on the general relevance of the full set of goals/capabilities of
theories, Ash says:

  The substance of the objections are that P&P is attempting to do
  much more than just parse sentences (Hallman) and that the goals
  of P&P are different to those of computational linguistics (McGinnis).
  I think there is merit to both these statements, but they are ultimately
  non sequiturs to the challenge. ... The requirement of capturing the
  adult grammar also means that it's insubstantial whether the goals of
  P&P are those of computational linguistics: P&P is still expected to
  capture adult grammatical competence in the end, even if this isn't a
  *motivation* for a lot of its practitioners.

Consider the following analogy. You and I both are given the task of
designing a motor vehicle that will get someone from point A to point
B. You come back with a Corvette, I come back with an SUV. Now you
say, "Let's go to a racetrack, I'll bet I can drive a circuit faster than
you, which means I have the better design." I will of course object:
speed was not specified as the desideratum of the vehicle. Both
vehicles can get a person from A to B. Moreover, the SUV can do lots
of things the 'vette can't: carry more than 2 people, hold lots of
luggage, play DVDs for the back seat passengers, transport moderate-
sized pieces of furniture, host a small business meeting, etc. My
motivation in designing it was to make it a multi-purpose family vehicle.
If I were now to go back to the drafting table and modify my SUV
design so that it keeps all its current features but can also go as fast
as a Corvette, surely I will have achieved a much more difficult task
than the person who just designed the Corvette.

I could have worked harder to make the analogy tighter, but the basic
point would still go through.



Prof. Carson T. Schütze Department of Linguistics, UCLA
Web: http://www.linguistics.ucla.edu/people/cschutze
Date Posted: 05-May-2005
Linguistic Field(s): Computational Linguistics
Discipline of Linguistics
LL Issue: 16.1439
Posted: 05-May-2005

Search Again

Back to Discussions Index