Discussion Details
| Title: | Re: 16.1251, Disc: A Challenge to the Minimalist Community |
| Submitter: | Carson Schutze |
| Description: | Following up on Peter's point
So the P&P parser that Sproat and Lappin envision would accomplish much more than comparable statistical parsers, which makes the proposed accuracy metric a poor yardstick for comparison In addition to capturing the distinction between learnable and unlearnable languages, P&P has as an important goal capturing the distinction between well-formed (grammatical) and ill-formed (ungrammatical) sentences within a language. As I understand it, the challenge demands only correct parsing of grammatical sentences, not correct rejection of ungrammatical ones. This represents another case where the P&P system, by virtue of the goals of the theory, is being subjected to greater demands than the statistical parsers. Comp Ling isn't my field either, but I gather it is a desideratum for at least some statistical parsers that they be robust in the face of noisy input, certainly during training but perhaps also during parsing, if they are to avoid being completely thrown off by the occasional typo or unfamiliar word. So it strikes me as an interesting empirical question whether such robustness, if indeed the best statistical parsers have it, hinders them from being able to detect ungrammaticality in general. Of course humans too can "cope with" ill-formedness of various kinds (as Sproat and Lappin note), but they mostly know when they are having to do so, i.e., ill- formedness is still detected. So, I would like to suggest a revised version of the challenge that incorporates a second corpus consisting of ungrammatical sentences that are to be identified as such. (Earlier P&P parsers such as Fong's were designed to do this, but it's not obvious that this ability will easily scale up with broader coverage, so I don't think this is a sucker's bet.) Furthermore, since the computationalists got to choose the corpus of good sentences, it would seem only fair that the theoreticians get to choose the corpus of bad sentences :-) P.S. The statistical parsers will still be getting off easy, in my view, because the unfamiliar sentences they *are* supposed to parse as well-formed are drawn from the same sample as the training set. The set of novel sentences humans [and P&P parsers, we hope] parse as grammatical arguably includes sentence types that do not occur in the language learner's input. -- Carson T. Schutze Department of Linguistics, UCLA Web: http://www.linguistics.ucla.edu/people/cschutze |
| Date Posted: | 22-Apr-2005 |
| Linguistic Field(s): |
Computational Linguistics
Discipline of Linguistics |
| LL Issue: | 16.1288 |
| Posted: | 22-Apr-2005 |

