Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
Contrary to what Phil Bralich suggested at various places in this discussion, silence on certain points he made earlier does not automatically mean agreement with him or surrender. A bare repetition of previous arguments (such as "please look closely at those standards (repeated below)"), however, contributes nothing to the deepening of the discussion and will annoy the readers. It only shows that people mutually reject the relevance of each other's arguments. Having said this, I would like to make one remark on Phil Bralich's reply in issue 9.342, because I have the impression I can expand and clarify the point I want to make. Text with ">>" is from my contribution in issue 9.328. >>A syntactic theory is part of linguistics as an empirical science. >>Empirical sciences are concerned with explaining chosen aspects of a >>domain of observations in the real world. In the case of >>linguistics, the domain is natural language, but different aspects >>of language can be chosen as a goal for explanation, >>e.g. acquisition in Chomskyan linguistics, processing in LFG. The >>success of a linguistic theory depends on the degree to which an >>explanatory account is reached. The fact that different linguistic >>theories take different questions about language as a basis for >>research implies that in some cases a common ground for evaluation >>is missing. <...> > >This is quite true, but how can you actually >>study acquisition or >processing if you have not yet properly >>isolated or described what is >that is being acquired or processed? >>You cannot talk about the >acquisition of langauge or the processing >>of langauge if you are still >not able to properly describe parts of >>speech, parts of the sentence, >statements, questions, subjects, and >>so on and the relationships >between them. You simply have not >>completed the preliminaries. I am >not saying that syntax is all of >>linguistics any more than I am saying >that the isolating the >>periodic table is the whole of chemistry, but >without the work >>being substantially completed on those basics neither >field is >>really ready to begin. Any theory that attempts to study >these >>"domains of observation in the real world" must first >demonstrate >>that it has at least isolated and described these domains >in some >>tangible sense. My main argument is that the entire field is >>>starting off at the wrong end of the stick I thank Phil Bralich for his acknowledgement that the entire field is on my side, although I suspect this is a slight exaggeration. Without exaggeration, however, I can claim that in philosophy of science it is generally acknowledged that theory-neutral observation and theory-neutral description are impossible. There are just too many different things to observe and too many possible ways of describing them. In order to make a sensible selection, a theory is necessary. (If you think you do it without a theory, your theory is just entirely implicit, which is not a recommendation) As a consequence, the question which aspect of language is to be explained cannot wait until after the description, because it interacts with the description. This interaction influences, for instance, the choice of most urgent data to cover, the criteria for a valid coverage, etc. In a theory of linguistics the most urgent data to be described need not be the most frequent ones (i.e. the ones most urgent for a parser), but they should be the ones that provide arguments for decisions on how to extend or modify the theory. Summarizing: Theoretical linguistics and CL have different goals. The descriptive goal of CL is not a subset of the goal of theoretical linguistics, although there is a significant overlap. Explanation does not follow description in theoretical linguistics, but the two develop in interaction. Therefore, 1. a linguistic theory which does not give a description of all the phenomena Bralich's parser covers need not be a bad theory; 2. if Bralich's underlying theory is only descriptive it is not as good _qua linguistic theory_ as the existing ones. (N.B. I do not say anything about the evaluation of Bralich's parser as a parser) Pius ten Hacken ================================================================== ================================================================== Dr. Pius ten Hacken Institut fuer Informatik/Allgemeine Sprachwissenschaft Universitaet Basel Petersgraben 51 || Tel. +41-61-267'33'38 CH-4051 Basel || Fax +41-61-267'32'51 Switzerland || email: tenhackenMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueubaclu.unibas.ch web page: http://www.unibas.ch/LIlab/staff/tenhacken ================================================================== ==================================================================
I wrote: >Relative to the Derek Bickerton/Philip Bralich parser adequacy >criteria: the field of computational linguistics has spent quite a >number of years developing evaluation criteria for parsers, which I >recommend looking at before you start reinventing the wheel. See the >journal Computational Linguistics for the last five or six years or >so, or for a summary, you can read the chapter that my coworkers and >I wrote on comparing the theoretical and corpus-based computational >enterprises, in a book edited by John Lawler called Computers and >Linguistics, due out in April. And Philip Bralich responded: >Yet his organization as well as all others have yet to be able to >create a BracketDoctor--a device that generates trees and labeled >brackets in the accepted style for this industry, the style of the >Penn Treebank II guidelines, or a MemoMaster, a device that increases >by many thousands the number of commands that are possible for a >speech rec navigation and control system. In addition, the standards >that I propose are largely based on functionality that most people >have assumed that parsers and theories of syntax had handled years >ago. The standards I have proposed are meant to demonstrate that >there is a serious problem in the field; that is, very basic levels of >functionality (see below) that many believed had already been >achieved, simply have not been reached. First, I'm rather disappointed in Dr. Bralich's rather blunt advertising in a supposedly academic debate, as evidenced in his rather obvious plug for his software in thie first sentence of this paragraph. Second, "my organization", the MITRE Corporation, has been an active participant in helping define standards for a sequence of DARPA-sponsored evaluations, MUC (the Message Understanding Conferences, to which Dr. Bralich alludes), which is now ongoing for almost ten years and has been regarded by virtually everyone in the field as a singular success. In particular, the government funders and organizers of this evaluation, who have real-world needs for digesting and understanding large amounts of text (such as Wall Street Journal and the AP newswire), have judged the program to be extremely successful. The MITRE language research group has participated in these evaluations, as have groups from BBN, SRI, and a number of other major university and industry research labs (consult any of the 7 MUC proceedings for details). How Dr. Bralich can dismiss this community-wide effort so casually is absolutely beyond me. In fact, the discovery that Dr. Bralich seems to have so recently made - namely, that many linguistic theories are emperors with no clothes - was a discovery which almost everyone in the MUC program made years ago. Dr. Bralich is absolutely right in observing that "very basic levels of functionaliy ... that many believed had already been achieved, simply have not been reached", and I believe personally that the field of theoretical linguistics is becoming increasingly irrelevant because of this, since we now have the computational resources to test the accuracy, speed and coverage of these systems. And I sympathize tremendously with Dr. Bralich's crusade. However, I can't begin to express how damaging it is to dismiss a decade of well-thought-out attention to this problem in a closely related field. In particular, Dr. Bralich claims that "the standards that I propose are largely based on functionality that most people have assumed that parsers and theories of syntax had handled years ago." I don't disagree with this assertion; but the implication is that the assumptions about what parsers and theories of syntax are doing is essentially right. The value of the approach followed by computational linguistics is that it is task-based, where the task is not parsing, but rather doing something useful with the text: either doing topic identification, person/location/organization identification, identification of events for entry into a database, etc. What the field discovered were two very important things: (1) that many problems which syntactic theories took as central were by and large unimportant because of their infrequent occurrence (like quantifier scope disambiguation), and (2) many problems which syntactic theories ignored were crucial and interesting (like part-of-speech identification and segmentation of text into sentences). My concern is that Dr. Bralich's set of criteria, while apparently fairly basic, is distilled from a literature tradition which takes the sort of syntactic analysis of sentences which generative linguists do as a useful and important goal as given, which is not necessarily the case; for many tasks, many of these criteria can be ignored, and the important thing for making this technology useful is not whether it conforms to what we think a syntactic theory should do, but whether what a parser does contributes constructively to the accomplishment of some task-oriented analysis of linguistic data. By the standards of evaluation which have evolved in computational linguistics, Dr. Bralich's standards are unacceptably vague. When he says that a system should identify parts of speech, which parts of speech does he mean? The Brown Corpus has about 40 of them; linguists tend to assume about 7, with some feature breakdown, and no two frameworks use the same set. What about systems which aren't perfect (since none of them are)? Is precision (percentage of answers found which are correct) more important than recall (percentage of correct answers which are found)? I could observe the same problems with his definition of terms in many other places in his standards. The point of the MUC evaluation is that the criteria which have been laid down are precise enough to measure progress on the given task, and detailed and explicit enough to forestall any argument about whether scores are comparable. Dr. Bralich's criteria fail on both these counts. >Standards proposed by MUC, >Computational Linguistics and so on do not address the fact that they >are not requiring their members to meet those very basic standards. >Rather there is an end-run around that expectation that takes one to a >world of non-existent parsers that are meeting standards that are "out >there somewhere" among five or six years of publications (no dates or >page numbers), and an unpublished book. How can standards of any sort >be of value if the ones I have proposed (reprinted below) have not yet >been met. These basic levels of functionality should be met long fore >anyone attempts comparisons or evaluations of different systems. If >you just look at them you will see they make a good qualifying round >from which to begin the discussion of "mature parsers" and >"mature theories of syntax." First, the idea that MUC ought to be handing out a "Good Housekeeping Seal of Parsing" is absurd on its face. The scores for the systems which are participating in MUC are out there for the world to see; every one of the proceedings has been published, and references to them can be found in any article in CL in the recent past. The reason I don't list page numbers and articles is because so much of the journal is devoted to this problem, as anyone who had spent any time reading it will attest (not to mention the annual ACL proceedings and, of course, the MUC proceedings). The idea that I've alluded to some "cheat" involving nonexistent articles and language analysis systems is simply insulting. All the MUC proceedings list the participants and their scores, as well as describing the tasks in detail. Anyone who suggests or implies otherwise simply hasn't read the literature. [And the only reason you can't download the article we wrote from the Web is because the publisher insists on first publishing rights. And there's nothing in it which is new; it is simply a description and distillation of the work that's already been done in the field.] Finally, the claim that "these basic levels of functionality should be met long before anyone attempts comparisons or evaluations of different systems" is simply dogma, and ignores the progress in the field which I outlined above. The reason we don't evaluate or compare systems based on these criteria is because the criteria are spectacularly ill-defined, and in many cases irrelevant. >Before reading the standards I propose below, please note that there >is no list of standards given in Mr. Bayer's letter, just a reference >to standards that have supposedly been dispersed "somewhere" >throughout five or six years of this publication or in a yet to be >published book. There is also no reference to working parsers that >could meet any standards therein proposed and certainly no discussion >of parsers that could actually meet the standards I have proposed. As I've said, the literature speaks for itself. If I were to refer to Chomsky's Minimalist Program, would Dr. Bralich challenge its existence simply because I failed to provide a bibliography? I hope not. And it's Dr. Bayer, by the way, and, yes, the doctorate is in theoretical linguistics. >Instead of paging through those many periodicals, ask yourself one >question. "Why isn't there a single reference to a list of standards >in this field or from this journal such as proposed below?" What if I said, "Instead of paging through those many periodicals about GB, why not just make up your own theory? Why isn't there a single list of rules in GB that I can refer to, like Cliff Notes?" The answer is obvious: because the field is large and complicated, and the problems are difficult, and you should read the literature, because it's important and relevant and that's what scholarship is about. All this is also true of evaluation of language systems. To suggest otherwise is insulting to all the theorists who have slaved over these difficult problems. >The answer I think is that there is a painful awareness on the part of >the computational linguistics community of their lack of success after >35 years in which millions of dollars and hundreds maybe thousands of >man years have been invested. Cerainly, there is not the pride that I >feel in asserting in black and white in a list appended to this >message exactly what is possible in this field, and the value it will >have not only for my company but for creating jobs and projects for >students and linguists in this area for many years to come--not only >in English, but in all the langauges of the world. It is not wise for >the field overall to shun the one parser that shows any promise at all >of making good on 35 years of empty promises. All that does is >guarantee that in the long run the jobs, the projects, the profits, >and so on will all belong to Ergo. I would like to see this company >contiue to profit in this field certainly, but I will feel guilty if >the entire field of computational linguistics tosses the whole of the >jobs and projects into our laps simply because they were unwilling to >admit they cannot meet the standards we propose. Let's not argue about who's failed; in general, computational linguists have made far more progress in evaluation (and in producing usable systems) than most people working in the field. I have a counterproposal for Dr. Bralich: run the Ergo system against the MUC-6 or MUC-7 evaluation set, and tell us what your score is relative to the other systems which participated in the evaluation. I won't bother responding to the rest of Dr. Bralich's message; it continues to imply that I am alluding to evaluation criteria and software systems which don't exist, based on Dr. Bralich's apparent unwillingness to read a body of commonly-available literature. It is not my duty to educate Dr. Bralich about what's been going on in computational linguistics; the literature is there and anyone can look at it, and I, for one, won't demean the complexity of the field by suggesting that it ought to be summarizable in a one-page note. My larger concern is that Dr. Bralich's criteria will be taken as characteristic of the way evaluation of language systems ought to be done, when in truth they're badly and imprecisely defined, and focused on details which have yet to have a demonstrable impact on the sort of tasks which language systems are currently capable of doing. We all know that there's a lot about language we don't understand, and as a result computational linguists have discovered that the dream of robust, full-coverage, detailed analysis of language is far, far away. It turns out that many of the detailed phenomena of the sort that theoretical linguistics is fond of examining either turn up very seldom in corpora, or can be ignored for the purposes of a wide range of tasks, or can't be evaluated yet due to disagreements about terms, etc., etc. There's no doubt that in the long run, language systems will have to many of the things which Dr. Bralich alludes to; but it has turned out to be massively unproductive to focus on such a theoretically-inspired list in the near term. This result has grown from many, many years of consideration of the problem by a wide range of well-trained and intelligent researchers whose contribution Dr. Bralich seems to regard as worthless. No greater disservice could be done to the notion of evaluable language systems than for theoretical linguistics to take Dr. Bralich's criteria as its primary goal. Samuel Bayer The MITRE CorporationMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
The discussion on this topic is a familiar one in liguistics and in science in general. As my contribution to it, I should like to clarify some of the underlying assumptions made by Bralich and other computational linguists on the one hand, and by theoretical linguists like ten Hacken and Arnold on the other. First, and most basic, is the question of what's been called "the purpose of linguistics" (or any other discipline, for that matter). The traditional and obvious formulation of the goal as "describing language" has at least two serious shortcomings (which are really two aspects of one and the same area): That of universality and that of prediction. In other words, how can we make certain that the sample of a language we happen to be describing is representative of human language, and that in two senses: How can we generalize from our sample to the rest of the language in question, and how can we generalize to other languages? Chomsky's reformulation of our goal as "accounting for the native speaker's (linguistic) intuitions" successfully enlarged the discipline by including the area I mentioned. In doing this, it also moved lingistics squarely into psychology. I believe that here, already, we've come to a division of interests between computer linguists and what one might want to call "speaker linguists"; for the former are not, as a rule, interested in all aspects of "native speaker intuition", though they are, of course, interested in generalizing their sample to the rest of the language in question; and, to a lesser extent, in generalizing to human laguage. With respect to the latter, they mostly seem to assume that the language they're describing is representative of human language. But, as Chomsky pointed out, and as we all learned in grad school, "accounting for native speaker intuition" also has serious implications for the acceptability of a proposed analysis. Although we don't all agree just what we mean by this phrase and how seriously we have to apply it, in the last forty years or so linguists have appealed to it often, and analyses were supported by the claim that they were "intuitively obvious", or disparaged by being called "counterintuitive". Computational linguists do not appear to be concerned with accounting for native speaker intuition, except in the sense that they assume that computer and software model brain and thought (mind) respectively, and that, moreover, they do so in some simple one-to-one way. The earliest argument for a particular model of grammar based on this analogy that I know of is Sidney Lamb's claim for what he then (the mid sixties) called "stratificational grammar" It was accepted by other computational linguists working within his paradigm; e.g., the group working in Santa Monica at that time. Now, if there is one thing that recent work in neurophysiology, psychology, psychobiology, and related areas has shown, it is that the human brain (or that of any other living being we have examined, for that matter) does NOT work like a general purpose serial computer. As far as I can tell, there are three possible reasons why anyone might today still assume that syntactic analysis is (or should be) based on algorithms or other mathematical models. In the simplest case, the researcher is not aware of work in the above mentioned areas, and should thus be directed to the appropriate readings. In the more complex case, what linguists call "psychological reality" (another way of saying that we must account for native speaker intuition) does not interest her or him, as it does not seem to interest most computational linguists. In this case, they and we work in different, non-commensurable paradigms, and there is little use in arguing with each other concerning the adequacy of the other's analysis. Both groups would do well, however, to be aware of this, and thus avoid needless mud slinging. The most complex case is the one where a researcher believes that the human linguistic ability is different from all our other abilities and can, indeed, be accounted for by using algorithmic models, while our other abilities cannot. While in this area all the evidence is not yet in, it is becoming clear where we should look for it: Paleoanthropology can help shed some light on the question (cf., e.g., Steven Mithen's recent book "the Prehistory of Mind"), but most important should be work in neurophysiology and psychobiology. If our language processing activities could be shown to occur only in certain areas of the brain and no other processing occured there, then we should have a strong case for claiming that our language ability is different from our other abilites. I need hardly point out that not only do we not have such evidence but, more important, the evidence we do have indicates that language processing is "mixed" with all sorts of other processing, both in the sense that it occurs not only in specific areas (Broca's and Wernicke's areas as tradition has it), and in the sense that other types of processing occur there as well. The evidence on the various types of aphasia is by far not as clear cut as our textbooks make them out to be! A recent study showed, for example, that for every case corresponding more or less to the traditional categories of trauma and (type of) aphasia there was one that did not: either the lesion(s) in Broca's or Wernicke's area(s) did not produce aphasia, or lesions in other areas did. Other, non-invasive procedures on healthy brains (PET, tomography, etc) also show that language input calls forth distributed processing ativities, not exclusive to, or even mainly in, the so-called language areas. There was one study (sorry, can't remember the referene) that showed right (i.e., non-dominant) hemisphere activities when subjects were dealing with ambiguous input! The most interesting, not to say amazing, work I know of in this respect is that of Robert Lefroy, a Western Australian teacher, who worked with youngsters having difficulties acquiring literacy skills (i.e., reading and writing). He reports considerable improvement by training various of their motor skills, like throwing a ball, skipping rope, jumping on a trampoline, etc. Of course, they also worked on their literacy skills, but this they had done before, in other schools, without success. In other words, the strong versions of modularity of the mind (a la Fodor) are unlikely to be correct, since there is no evidence for such strong modularity in the brain. On the contrary, while the brain is undoubtedly modular, particularly in the cortical areas these modules are largely multi-functional and communicate with extremely large numbers of other modules, and do so not only within the cortex but also with modules in the midbrain and in the brainstem. This also makes a truly separate language facility and language processing unlikely. Therefore, models of language claiming to be based on psychological reality, native speaker intuition, or some such, should, at least for the present (i.e., until neurophysiological evidence of the type I indicated above is found), not be algorithmic-mathematical. Linguists in search of non-algorithmic models I refer to work in what might loosely be called "constructivist linguistics"; computational linguists in search of such models I refer to work on neural networks; the philosophically inclined, lastly, I refer to the work of the Churchlands. (All of this has been going on since at least 1980.) Sorry, but this turned out to be longer than I anticipated. Peter Menzel e-mail: pmenzelMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueclub-internet.fr