Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
> ... the best theory of syntax must necessarily be the one that > demonstrates itself to be most completely implemented in a > programming language. ... the best independent and objective > measure of a theory of syntax' overall effectiveness is its ability > to generate, in a computer program, standard grammatical structures > ... ... 2) that any theory that can not be fully implemented in a > programming language as described in the standards outlined above, > is flawed in some way; and 3) that the best independent and > objective measure of a theories scope, efficiency, and effectiveness > is the degree to which it can be implemented in a programming > language. (Of course, the basis for judgement will be the Penn > Treebank II guidelines and the standards described above). I'd like to makes some points about Phil Bralich's recent message, summarised above in his own words. 1. He seems to take it that the goal of NLP, and syntactic theory in general, is to produce syntactic analyses, and that the only valid type of analysis is the same as, or equivalent to, that of the Penn Treebank. This is itself a "theory of syntax", and a controversial one. For a human being using language, the only function of syntax is to fit word meanings together to make sentence meanings: syntax has no value on its own, it is just a key to the semantics. This is true for most NLP applications too. Grammatical formalisms which do not analyse sentences in terms of trees or labelled bracketings (e.g. some types of categorial, dependency, and systemic grammar) are capable of producing accurate semantic interpretations and are computationally implementable and efficient. In this sense, Phil Bralich's characterisation of the ideal parser as one which produces Penn Treebank syntactic structures quickly and accurately covers only part of the field, and not necessarily the part which will turn out to describe language most successfully in the long run. 2. Suitability for implementation in a programming language is not the only measure of the success of a grammatical formalism. Language is (presumably) designed to work with human brains. We don't know how brains work, but they are obviously not the same as serial computers. The most natural description of the grammar of a language, then, is not necessarily the one that works best on a serial computer. Even implementability itself is not always relevant: e.g. a syntax which has a 30-word sentence producing 30 million analyses is unimplementable with current techniques; but in a system which analysed incrementally, one or a few words at a time, excluded 99.99% of analyses on semantic grounds before their structure was built, and represented the remaining ambiguities as vagueness in a single analysis (all of which the brain perhaps does), it would be fine. 3. One of the contributors to this list may well have the ideal grammatical formalism half worked out on their desk, but lack the time and programming skill (or the money to employ people) to compile the 50,000 word dictionary and full-scale grammar needed to pass Bralich's test. As has often been said about machine translation, any half-way decent syntactic formalism will serve as the base for a good system, given the time and money for development. Bralich's argument here seems to me only partly relevant to linguistics. He's talking largely about programming skill and availability of resources, not about the adequacy of theories of linguistic description. 4. I'm sure many others too will have taken exception to the the following: > computational linguistics departments who do not mention these tools or > use tools of this calibre are remiss in their duty I requested a copy of the Bracket Doctor to use in my classes, but it seems a Unix version does not exist - and like most University CL departments, we use Unix machines here. John Phillips Dept. of Linguistics Yamaguchi UniversityMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
At 03:59 PM 2/22/98 +0900, John Phillips wrote: >I'd like to makes some points about Phil Bralich's recent message, >summarised above in his own words. 1. He seems to take it that the >goal of NLP, and syntactic theory in general, is to produce syntactic >analyses, and that the only valid type of analysis is the same as, or >equivalent to, that of the Penn Treebank. This is itself a "theory of >syntax", and a controversial one. You have the point rather backward here. I am saying that if a theory of syntax (whatever other things might be done in NLP) is to consider itself a mature theory of syntax it should be able to produce programs that meet those minimum standards I have outlined (appended at the end of this message). Look at those standards very closely. I am sure you will agree that most people have believed that current theories can already program those very basic requirements, but in reality they cannot. After reviewing those standards, ask yourself if it is reasonable to expect theories of syntax to handle this minimal level of programming after 35 years of work (and millions of dollars in grants, salaries and other expenses in academia and in industry). Then ask yourself, if there is ONLY ONE theory that can meet those standards, what does that mean for other theories of syntax. (I chose to say "35 years" based on the number of years of the existance of the Association of Computer Linguistics.) The Penn Treebank II guidelines are widely established as the standard for this area of linguistics. Researchers in both academia and industry always ask me about our ability to handle the Penn Treebank styles. I assume they ask one another for the same proof of ability in this area. It is for that reason that we created programs that could generate those trees and labeled brackets. That is so that we could use a standard developed in this field to demonstrate our abilities. By the way, there is no one else doing this at this time except Satoshi Sekine at NYU whose parser produces only the simplist of sentences. The fact that the major theories who are supported by hundreds of researchers and sizable grants cannot do this is a cause for serious concern on the part of the entire field of linguistics. Institutions like MIT and Standford are lacking in reseachers, funds or graduate students. Why can they not generate this standard which this field itself has established? The controversial nature of the Penn Treebank styles should not be a problem. Certainly the theory that underlies our parser is very different, but the fact that we have thorouhgly worked out our theory and our algorithms means that it is a simple manner to translate our output into the Penn Treebank sytle. Any other theory who can meet the standards that are outlined in my post (appended below) should be able to do the same. That is they should be able to translate their output into the Penn Treebank styles with relative ease. Again the fact that these standards and the Penn Treebank guidelines are not being generated by Stanford, MIT, Microsoft, IBM and so on is a cause for serious concern not only among linguists but also among those who fund such projects and stockholders of those companies. And even more so for there being one theory that has succeeded in this area where they haven't. They cannot argue that it is impossible and they cannot argue that they have already done it if they cannot show it. >For a human being using language, the only function of syntax is to >fit word meanings together to make sentence meanings: syntax has no >value on its own, it is just a key to the semantics. This is true for >most NLP applications too. Grammatical formalisms which do not >analyse sentences in terms of trees or labelled bracketings (e.g. >some types of categorial, dependency, and systemic grammar) are >capable of producing accurate semantic interpretations and are >computationally implementable and efficient. This simply has not been shown. As a matter of fact, since no NLP device (based on semantics, syntax of anything else) before now has been able to produce the Penn Treebank trees or meet those very minimal standards that I have proposed, we can state confidently that the opposite is true-- that the above clearly are NOT computationally implementable and efficient. > In this sense, Phil Bralich's characterisation of the ideal >parser as one which produces Penn Treebank syntactic structures >quickly and accurately covers only part of the field, and not >necessarily the part which will turn out to describe language most >successfully in the long run. Let's not forget that any characterization of a string or sentence or set of strings and sentences is going to have to be able to account for some very regular, very predictable facts of structure. As a matter of fact no description of semantics, pragmatics, or discourse is going to be properly grounded until such time as the basic facts of structure have been properly understood. This is a reality that all of linguistics has to live with. We can not talk about quantum physics without first having a thorough understanding of atoms, electrons, protons, and so on. In a similar manner the linguistics field is not ready to talk about these other areas until the basic facts are understood. I am not saying that they cannot talk about or research these areas I am only saying that until basic structure is understood sufficiently, those other studies will never be more than speculation--a lot like the work in alchemy could not evolve into modern chemistry until the basic building blocks of nature were understood. They had every right to hypothesize and speculate but it was not until the basics were understood that we saw the advances that were possible from chemistry. This will be true for linguistics as well. As soon as the basic building blocks and their basic relationships are understood, then and only then will we see major advances in other areas of linguistics. >2. Suitability for implementation in a programming language is not >the only measure of the success of a grammatical formalism. Language >is (presumably) designed to work with human brains. We don't know how >brains work, but they are obviously not the same as serial computers. >The most natural description of the grammar of a language, then, is >not necessarily the one that works best on a serial computer. Even >implementability itself is not always relevant: e.g. a syntax which >has a 30-word sentence producing 30 million analyses is >unimplementable with current techniques; but in a system which >analysed incrementally, one or a few words at a time, excluded 99.99% >of analyses on semantic grounds before their structure was built, and >represented the remaining ambiguities as vagueness in a single >analysis (all of which the brain perhaps does), it would be fine. Yes, but take a look at those standards and the theoretical mechanisms of current theories of syntax and see if you can find any principled reason why they cannot be brought together to form a proving ground of a theories scope and efficiency. Surely you aren't going to argue that there must be better theories of math because math programs can make calculators? You aren't also going to argue that there is a better theory of mathematics out there that takes more of the brain into account and therefor proof of its validity is the fact that it cannot be programmed to make calculators? >3. One of the contributors to this list may well have the ideal >grammatical formalism half worked out on their desk, but lack the >time and programming skill (or the money to employ people) to compile >the 50,000 word dictionary and full-scale grammar needed to pass >Bralich's test. As has often been said about machine translation, any >half-way decent syntactic formalism will serve as the base for a good >system, given the time and money for development. Bralich's argument >here seems to me only partly relevant to linguistics. He's talking >largely about programming skill and availability of resources, not >about the adequacy of theories of linguistic description. The dictionary can be obtained from the Linguistic Data Consortium for about $2,500. And there should be plenty of grad students and programmers around who would see this as a good resume building project. After 35 years this should be common knowledge and there should have been plenty of opportunity to have tried it. If there are new and untried theories out there, we at Ergo would be interested in looking at them with an eye toward joint development efforts. More importantly though is the fact that MIT, Standford, Microsoft, and IBM are not saddled with these financial and personnel problems but they also have not produced any devices that can meet the standards I have proposed. This in spite of the fact that most poeple (even you it seems) have been led to believe these standards are easily met and have already been met. >4. I'm sure many others too will have taken exception to the the >following: >> computational linguistics departments who do not mention these >>tools or use tools of this calibre are remiss in their duty >I requested a copy of the Bracket Doctor to use in my classes, but >it seems a Unix version does not exist - and like most University CL >departments, we use Unix machines here. I think any CL department should be able to afford at least one Windows 95 machine. Maybe you can find one at the University library. The executable fits on one disk and is installed from a standard set up program, so just copy it to a disk and take it to the nearest WIN95 machine if you want to try it. We decided to use Windows 95 because it was the easiest and most convenient manner to get these tools to students, researchers, programmers, marketers, and so on in both academia and industry. We cannot limit ourselves to academia alone in the current intellectual climate. The original discussion can be found at Linguist 9.255 Philip A. Bralich, President Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 tel:(808)539-3920 fax:(880)539-3924Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
Although mindful of the risk of exposing myself to Saint Anthony's Fire, I wish to respond to the criticism of the EAGLES initiative made by Dr Bralich. I do so in my function as co-Chief Editor of EAGLES. There is a simple reason why EAGLES does not mention the criteria espoused by Dr Bralich: EAGLES has not so far concerned itself with proposing standards in the area of parsers. That is the short answer, interested readers please read on. It is unfair to criticise us for not doing something we had not included in our (public) programme of work. One may criticise the initial selection of topics, however. The topics retained were those where there was wide agreement that some kind of useful consensus could be obtained in the near term. The set of topics we actually worked on were furthermore constrained by factors such as availability of voluntary labour. We have worked on the following topics of immediate relevance to the current debate: * morphosyntactic annotation of text corpora * syntactic annotation of text corpora * morphosyntactic description of lexical items * syntactic subcategorisation of lexical items (verbs) * comparative survey of implemented computational linguistic formalisms * linguistic adequacy of CL formalisms * development of an evaluation framework for NLP products At present, we are working on, among other topics, semantic subcategorisation of lexical items, pragmatic annotation of corpora (text and spoken language) and developing proposals that complement ISO 9126 from the point of view of NLP products and quality in use. As Dr Bralich has found it difficult to find appropriate discussions in the EAGLES literature and as there are presumably others who have experienced such difficulty, let me point out a few areas of relevance, which also indicate the limits of our work. The introduction to our document on syntactic annotation of corpora states: "The scope of this report is syntactic annotation of corpora. At first glance, a study of such annotation practices is difficult to distinguish from a study of parsers, parsing, grammars, the representation of parses, and the formalisms adopted for such representations. Clearly, the syntactic annotation of corpora has a close interrelation with parsing (indeed, a major function of a syntactically annotated corpus is to provide a test-bed or a training-bed for wide-coverage parsers). This cannot be ignored in the report: but what we are ultimately interested in is the parsing schemes in use to date (i.e. the set of symbols used in the annotation scheme and guidelines for their application), although how the corpus is parsed (the parsing system) is relevant, albeit indirectly, to our task." As we are also working in a multilingual environment, where different languages have different linguistic representational traditions and needs, we find that there are issues in practical application of guidelines for syntactic representation: "Since the approach to syntactic annotation is to a large extent influenced by the language to be annotated, our guidelines do not give any preference either to a phrase structure annotation or to a dependency annotation. The phrase structure annotation, however, is in certain ways the more demanding of the two, which is why this report covers phrase structure in more detail. This should not be construed, however, as expressing a preference for phrase structure annotation. We will propose notations for both approaches." In their work, the corpus group took into consideration various projects including UPenn Treebank, ULancaster Treebank and the SUSANNE corpus, all of which heavily influenced the shape and content of our proposals. This same concern with linguistic representation is met with throughout the EAGLES reports. Here is an extract from the document on syntactic subcategorisation, for example: "The most important concern for EAGLES is linguistic substance. Consequently, the group is building on the results of the ET-7 feasibility study (Heid & McNaught, 1991) which recommended the following methodology: to break up the complex descriptive devices into `minimal observable facts' in order to arrive at the most fine-grained, common set of features underlying different theoretical frameworks or systems. EAGLES results are therefore based on a careful and detailed analysis of different linguistic theories and frameworks, but aiming at reaching a consensus at the level of these `minimal observable facts'. Connected with this basic objective is the approach chosen towards its achievement, an approach which can be defined as looking for an edited union (a term due to Gazdar) of the features proposed in the various major theories and systems. This approach tries to capture all the relevant distinctions made by the major lexical theories/systems, without taking a theoretical stand, thereby giving to features labels which are as neutral as possible. In an attempt to be as theory-compatible as possible, there are a few points where choices were left open, especially for those aspects of grammatical description which tend to be more theory-bound (e.g. grammatical relations and control). There are practical drawbacks to this decision -- especially with regard to the implementation of the proposed standard -- but, at least in this first phase, more importance was given to avoiding committment to specific theories of lexical description. We recognise that there is a tension between the decision to be flexible and open to more than one choice and the real and effective useability of the proposed standards. Without abandoning the principle of flexibility and openness, we provide an indication of usage by exemplifying the implementation of critical choices. In general, the EAGLES results are achieved in a dynamic way, with a cyclical process of revision after one or more phases of testing and feedback, possibly in large projects. The difference between the European approach and other approaches to standards should be pointed out here, to be taken as a description of a general tendency. While in, say, the USA, a sort of de facto standard is somehow made available to the community through the provision of publicly available data, in Europe we try to arrive at consensually agreed standards. This implies a considerable effort in trying to involve the relevant experts in the different areas of concern, either in the phase of producing the standards, or at least in the successive phases of testing the proposals and providing feedback. This approach also involves a large amount of overheads in terms of activities and work necessary to arrive at a consensus as well as a slower process of arriving at the aimed-for results." I also draw attention to preparatory work carried out on computational linguistic formalisms. In particular, the group charged with this work organised two workshops that brought together numerous practioners from industry and academia. One concentrated on intensive comparison of implemented formalisms: a common framework for comparison was agreed on and systems were put through their paces as part of the information gathering process. The other focussed on the linguistic adequacy of implemented formalisms. I stress that this work was preparatory and was not continued in phase II of EAGLES, for reasons of little interest in this context. However, the results did reveal a great degree of convergence among formalisms and gave indications how grammars associated with some formalism could be rendered reusable by another. The work on evaluation did not specifically include parsers: it was oriented at developing a general framework for evaluation of NLP products and focussed initially on adequacy evaluation. There has been increasing cooperation between EAGLES and those responsible for ISO 9126, as EAGLES has been instrumental in providing guidelines that neatly complement the ISO work. The recent book on NLP evaluation by Sparck Jones & Galliers recognised the contribution of EAGLES to evaluation. Lastly, I note that numerous projects and initiatives in Europe have chosen to adopt EAGLES guidelines especially for the representation and annotation of text corpora and dictionaries. ELRA also works with EAGLES guidelines. The initiative thus receives constant feedback from such widespread take-up of its results. (EAGLES also has an important activity in the development of guidelines for spoken language resources and processing and the speech community has responded warmly to our efforts in this area.). Those who wish to find out more about how EAGLES guidelines are being used, debated, developed, applied, etc., are welcome to attend (or to acquire the proceedings of) the First International Conference on Language Resources and Evaluation (Granada, May 1998) where many papers and workshops will refer to EAGLES results and in a constructively critical way into the bargain. (Conference URL: http://www.icp.inpg.fr/ELRA/conflre.html) The EAGLES initiative will produce a new round of publications in the 4th quarter of 1998, which will be available from http://www.ilc.pi.cnr.it/EAGLES/home.html I trust this explanation has served to put EAGLES work and results in context. One may naturally disagree with our approach; no-one is imposing standards on anyone in EAGLES. However, we are encouraged that large numbers of people from industry and academia have become involved in this initiative and have given freely of their time to develop recommendations and guidelines that, by most accounts, meet with widespread approval and adoption. We must be doing something right... JMcN - John McNaught jockMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueccl.umist.ac.uk (Co-Chief Editor, EAGLES) Centre for Computational Linguistics Department of Language Engineering UMIST PO Box 88 Sackville Street Manchester, UK tel: +44.161.200.3098 (direct) M60 1QD fax: +44.161.200.3099