|Title:||Re: 16.1288, Disc: Re: A Challenge to the Minimalist Communi|
So, I would like to suggest a revised version of the challenge that
incorporates a second corpus consisting of ungrammatical
are to be identified as such. (Earlier P&P parsers such as Fong's
designed to do this, but it's not obvious that this ability will easily
with broader coverage, so I don't think this is a sucker's bet.)
since the computationalists got to choose the corpus of good
it would seem only fair that the theoreticians get to choose the
bad sentences :-)
This is a very important point and negative data has been collected
and is used to evaluate deep linguistic processing.
A nice software for evaluating systems and working with test suite
databases can be found at:
Test suites for German, English, French, Spanish, and other
languages are also available there.
You may find test suites for German at:
These test suites contain (normalized) examples from the descriptive
literature, P&P, HPSG, and other theoretical literature. With the [incr
TSDB()] software it is possible to get a selection of sentences that is
relevant for a certain phenomenon. Sentences are crossclassified
according to the phenomena they are relevant for.
The idea is to develop these collections further into a generally
accepted benchmark for linguistic theories in general and for deep
linguistic processing in particular. Of course the negative sentences
can be used to check what statistical parsers have to say about them
in comparison to the well-formed examples.
So if somebody has a look at the German collection and wants to
contribute, please send me the relevant examples and pointers to the
publications in which the examples are discussed.
Universität Bremen/Fachbereich 10
Discipline of Linguistics