Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
It would seem that Philip Bralich's parser is most closely related to broad-coverage parsers like FIDDITCH (developed by Don Hindle at the then Bell Labs), the LINK grammar developed at CMU and work by Steve Abney (now at AT&T Labs). As far as I know, the current best broad-coverage parsers use statistical information, such as the ones described by Collins ``Three Generative, Lexicalised Models for Statistical Parsing'' in the 1997 ACL conference proceedings and Charniak ``Statistical Parsing with a Context-Free Grammar and Word Statistics'' in the 1997 AAAI conference proceedings. In the admittedly small world of academic research on broad-coverage parsing, parser performance is usually evaluated by computing the average of the labelled precision and recall scores comparing the parser's output with the hand-constructed parse trees of a held-out section (usually section 23) of the Penn II WSJ corpus. This provides a standard quantitative score of parser performance. The authors above claim average precision and recall scores of around 87%. I was wondering what average labelled precision and recall scores Philip Bralich's parser achieves? Mark Johnson Cognitive and Linguistic Sciences Brown UniversityMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
I want to thank Sam Bayer for alerting us to the existence of detailed evaluation criteria for parsers. If every article in recent issues of CL not only refer to these criteria, but also tell us where they can be found found, then I agree with him in urging Phil Bralich to test his own parser against these criteria before making such strong claims. Since other parsers have already been tested against the MUC criteria, this appears to provide a reasonably objective basis for comparison. For all I know, the Penn Treebank guidelines may be just as good or better in some sense (it's hard to tell for sure), but I haven't yet heard that they provide a detailed grading system like the MUC system, nor that other parsers have been tested against them. If the Bracket Doctor/Ergo system really turns out to be better than the other ones, I think many linguists would be interested in knowing more about the underlying system. Pius ten Hacken and Peter Menzel argue that theoretical linguistics is right to be less concerned with data coverage than computational linguistics, since it is more concerned with explanation in terms of mental capacities. It seems to me that the kind of work they are talking about can better be considered a kind of psycholinguistics, which by definition is concerned with the relationship between language and the brain. This is certainly a well established branch of theoretical linguistics, but it is not the same thing as syntax, which is primarily concerned with sentences (as many different kinds as possible) and the relationships between them. On the other hand, ten Hacken and Menzel might prefer to join forces with functionalist approaches to syntax, which aim to explain properties of language in terms of historical development, ambiguity avoidance, processing, etc. I think these are all valid approaches to language, but none of them is the same thing as formal approaches, which to my mind have a relationship to the rest of linguistics similar to that of mathematics to the natural sciences --that of a useful servant, whose task is to provide a precise model of the object of inquiry. Given the accuracy of this assessment, formal linguistics has the potential to be important not only for computational implementation, but also for many other areas, although computational implementation is still the best way to find out if the analysis really works, at least if the implementation is an accurate reflection of the grammar. I don't follow Menzel's reasoning when he apparently argues that because language functions are spread across various parts of the brain, it is doubtful whether it is algorithmic. I don't doubt the premise of this argument. Clearly, some, but perhaps not all, aspects of language knowledge, production, and understanding are linked to other actvities of the brain. But what has that got to do with the question of whether it's algorithmic? I even looked up the word "algorithm" to find out whether I had misunderstood the meaning of this word. According to my dictionary, an algorithm is just a procedure for doing something. I think it is clear that every healthy human being has an algorithm for his/her own language in this sense, even though this algorithm sometimes does not work as well as we want it to -- we sometimes have trouble finding the words we need to express our ideas. But some sort of algorithm is there nevertheless.. Peter Menzel suggests that neural networks is of interest for computational linguists trying to develop nonalgorithmic models, though not for theoretical linguists.. But as noted above, he believes that theoretical linguistics is concerned with the relationship of language to the brain. First of all, I don't see why neural networks shouldn't be part of our language algorithm, but my confusion on this point may be related to the discussion in the previous paragraph. More clearly, doesn't it follow that since neural networks is a topic which formulates hypotheses about neurons and how they are linked together in the brain that neural networks is of interest for theoreticians as well? Chomsky recently informed me by email that he would like to find a way to use neural networks, but didn't see a way to do this. Well, I think others in the field, including myself, have done some fairly detailed work in this direction. Dan MaxwellMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue