Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34890

Still Needed:

$40110

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Discussion Details




Title: Re: 16.1251, Disc: A Challenge to the Minimalist Community
Submitter: Carson Schutze
Description: Following up on Peter's point
 
  So the P&P parser that Sproat and Lappin envision would accomplish
  much more than comparable statistical parsers, which makes the
  proposed accuracy metric a poor yardstick for comparison
 

In addition to capturing the distinction between learnable and unlearnable
languages, P&P has as an important goal capturing the distinction between
well-formed (grammatical) and ill-formed (ungrammatical) sentences
within a language. As I understand it, the challenge demands only correct
parsing of grammatical sentences, not correct rejection of ungrammatical
ones. This represents another case where the P&P system, by virtue of the
goals of the theory, is being subjected to greater demands than the
statistical parsers.

Comp Ling isn't my field either, but I gather it is a desideratum for at least
some statistical parsers that they be robust in the face of noisy input,
certainly during training but perhaps also during parsing, if they are to
avoid being completely thrown off by the occasional typo or unfamiliar
word. So it strikes me as an interesting empirical question whether such
robustness, if indeed the best statistical parsers have it, hinders them from
being able to detect ungrammaticality in general. Of course humans too
can "cope with" ill-formedness of various kinds (as Sproat and Lappin
note), but they mostly know when they are having to do so, i.e., ill-
formedness is still detected.

So, I would like to suggest a revised version of the challenge that
incorporates a second corpus consisting of ungrammatical sentences that
are to be identified as such. (Earlier P&P parsers such as Fong's were
designed to do this, but it's not obvious that this ability will easily scale up
with broader coverage, so I don't think this is a sucker's bet.) Furthermore,
since the computationalists got to choose the corpus of good sentences, it
would seem only fair that the theoreticians get to choose the corpus of bad
sentences :-)

P.S. The statistical parsers will still be getting off easy, in my view, because
the unfamiliar sentences they *are* supposed to parse as well-formed are
drawn from the same sample as the training set. The set of novel sentences
humans [and P&P parsers, we hope] parse as grammatical arguably
includes sentence types that do not occur in the language learner's input.

--

Carson T. Schutze Department of Linguistics, UCLA
Web: http://www.linguistics.ucla.edu/people/cschutze
Date Posted: 22-Apr-2005
Linguistic Field(s): Computational Linguistics
Discipline of Linguistics
LL Issue: 16.1288
Posted: 22-Apr-2005

Search Again

Back to Discussions Index