Discussion Details
| Title: | Re: A Challenge to the Minimalist Community |
| Submitter: | Carson Schütze |
| Description: | Ash Asudeh [LL 16.1364] said
Some confusion has arisen in the subsequent discussion of the Sproat-Lappin challenge. Most of the subsequent posts discuss statistical parsing versus P&P parsing. However, the challenge has nothing to do with statistical parsers per se I would like to do some clarifying of my own. Let's see how the challenge was worded: [LL 16.1156] We challenge someone to produce, by May of 2008, a working P&P parser that can be trained in a supervised fashion on a standard treebank, such as the Penn Treebank, and perform in a range comparable to state-of-the-art statistical parsers. So, statistical parsers were relevant as an existence proof that the assigned task is doable using current technology. If there were no systems that could parse the Treebank 90% correctly (or whatever the standard is), then asking P&P to do so would be a very different kind of challenge; Sproat and Lappin frame their challenge thus: other approaches have reached this milestone, we challenge you to catch up. From that perspective it is entirely relevant what the capabilities and design goals of those approaches are, compared to those of P&P. [Of course it is true that one could challenge P&P to do as well as some nonstatistical parser, in which case that would be the system whose capabilities/goals would be relevant. In fact one could invent a new challenge by simply omitting the word "statistical" from the original, but (a) Sproat and Lappin explicitly included it; (b) I think that would make it harder to establish a metric for state-of-the-art-hood, because it would involve apples-and-oranges comparisons, but I'm sure others will disagree here.] Then Ash says about my previous point on ungrammaticality I don't understand the substance of this objection. All grammars, those used in statistical parsing or otherwise, attempt to reject ungrammatical sentences: Nobody wants their grammar/parser to overgenerate. Even if the claim is true of statistical parsers (I don't think it is), it certainly isn't true of the LFG and HPSG parsers and grammars noted above. Let me elaborate on John Goldsmith's [LL16.1432] defense of my point. Ash makes a claim about all grammars and about generation, but the challenge doesn't require the statistical parser to have a grammar or to generate in the relevant sense, it just requires it to map well-formed input strings to the "right" trees. If a grammar is defined, as Ash seems to assume and most would agree, as something that delineates all and only the well-formed expressions of a language, then the benchmark systems are certainly not in principle required to have one. If they provide an output for every possible input string, with no systematic distinction between the good and the bad, then by this definition they don't have a grammar, at most just half of one. Even if they did, the challenge contains nothing that would assess the set of strings that the grammar rules out, which is why I proposed a second part of the challenge to do so. I think Ash and I agree that any "interesting" model (I won't try to define "interesting", but we know what we mean :-) of human language will include constraints against overgeneration; in those terms, my point was that the challenge does not require the benchmark system to be interesting. [Of course once again one could invent a different challenge that pits a P&P parser against an HPSG parser, where the simplest form of my objection would go away: the benchmark system wouldn't be ignoring an entire ability that the P&P system is designed to model. I still think it would be interesting to test in detail whether the two systems rule out the same strings, and whether those strings are indeed all and only the ungrammatical strings of the language.] So I think we agree on the overall point that comparing a P&P parser to a parser that is committed (in the ways S&L outline) to the claims of some other linguistic theory would be more meaningful than a comparison with purely statistical parsers. But for those who disagree I would still submit that a comparison with a statistical parser would be more meaningful if it included a comparison of '(un)grammaticality judgments'. I do want to clarify something else John Goldsmith said, however: There is not universal agreement to the position that the ability to distinguish grammatical from ungrammatical sentences is an important function to be able to model directly, whether we are looking at humans or at software. There are certainly various serious parsing systems whose goal is to be able to parse, as best they can, any linguistic material that is given to them -- and arguably, that is what we speakers do too. This comment unfortunately conflates two notions that I was at pains to keep separate in my original posting. One is the idea that a system will produce *some* parse for every input string you give it, including the ungrammatical ones, rather than *just* returning "FAIL". The other is the idea that a system will flag all ungrammatical inputs as ungrammatical, whatever else it might do with them. The first may or may not be an ability that humans have in full generality, and depending on how you think they achieve it when they do, you may or may not want to model it within your parser. But the second is something humans unquestionably *can* do for at least the massively vast majority of possible strings, and I therefore submit that any system that purports to be a model of human language ability should be required to do the same. My original claim, once again, was that the challenge makes no requirement on this second point, but that it would be much more sensible if it did. Of course it also makes no requirement on the first point, but I did not propose expanding the challenge to incorporate it, for two reasons. One, which I think was John Goldsmith's main point, is that there is much less consensus on this as a desideratum of models of human parsing. The second is that there is almost no empirical data against which we could test statistical, P&P, HPSG or any other parsers with regard to how they ought to "interpret" ungrammatical strings. I know some people can supply some references, but their scope is extremely limited. If we consider one of the dumbest ways of generating a test corpus of ungrammatical sentences, namely by fully reversing the sequence of words in each of the Treebank sentences, I don't think anyone has a clue how people would interpret them (if at all). Finally, on the general relevance of the full set of goals/capabilities of theories, Ash says: The substance of the objections are that P&P is attempting to do much more than just parse sentences (Hallman) and that the goals of P&P are different to those of computational linguistics (McGinnis). I think there is merit to both these statements, but they are ultimately non sequiturs to the challenge. ... The requirement of capturing the adult grammar also means that it's insubstantial whether the goals of P&P are those of computational linguistics: P&P is still expected to capture adult grammatical competence in the end, even if this isn't a *motivation* for a lot of its practitioners. Consider the following analogy. You and I both are given the task of designing a motor vehicle that will get someone from point A to point B. You come back with a Corvette, I come back with an SUV. Now you say, "Let's go to a racetrack, I'll bet I can drive a circuit faster than you, which means I have the better design." I will of course object: speed was not specified as the desideratum of the vehicle. Both vehicles can get a person from A to B. Moreover, the SUV can do lots of things the 'vette can't: carry more than 2 people, hold lots of luggage, play DVDs for the back seat passengers, transport moderate- sized pieces of furniture, host a small business meeting, etc. My motivation in designing it was to make it a multi-purpose family vehicle. If I were now to go back to the drafting table and modify my SUV design so that it keeps all its current features but can also go as fast as a Corvette, surely I will have achieved a much more difficult task than the person who just designed the Corvette. I could have worked harder to make the analogy tighter, but the basic point would still go through. Carson -- Prof. Carson T. Schütze Department of Linguistics, UCLA Web: http://www.linguistics.ucla.edu/people/cschutze |
| Date Posted: | 05-May-2005 |
| Linguistic Field(s): |
Computational Linguistics
Discipline of Linguistics |
| LL Issue: | 16.1439 |
| Posted: | 05-May-2005 |

