Editor for this issue: Brett Churchill <brett
linguistlist.org>
This message was originally submitted by annesMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueHTDC.ORG to the LINGUIST list at LINGUIST.LDC.UPENN.EDU. If you simply forward it back to the list, using a mail command that generates "Resent-" fields (ask your local user support or consult the documentation of your mail program if in doubt), it will be distributed and the explanations you are now reading will be removed automatically. If on the other hand you edit the contributions you receive into a digest, you will have to remove this paragraph manually. Finally, you should be able to contact the author of this message by using the normal "reply" function of your mail program. ---------------- Message requiring your approval (317 lines) ------------------ To the readers: On March 17th I will be giving a talk at the University of Hawaii's Linguistic Department Tuesday Seminar called, "The Best Theory of Syntax." In this talk I intend to make the rather non-controversial point that, the best theory of syntax must necessarily be the one that demonstrates itself to be most completely implemented in a programming language. I am writing to the group to ask for references, obscure or otherwise, where this basic proposition has been put forth before in the literature or through personal communications. Comments, criticism, and discussion of this argument are also welcome. I will post a summary of the references to the list. (Be sure and mention if you do not want your name mentioned in the summary). Some might argue that I am merely putting complex arguments into simple language but these arguments have substance and effect in either simple or complex langauge. This is especially true when we are dealing with the application of syntax to a multi-billion dollar industry such as NLP. More specifically, I intend to present the arguement that the best independent and objective measure of a theory of syntax' overall effectiveness is its ability to generate, in a computer program, standard grammatical structures and to manipulate these structures in the same way as users of the language being described. That is, I intend to argue that the best theory of syntax is the one that produces the best parsers. Following that I will present a very ordinary set of standards for the evaluation of parsers and then based on the comparison of theories using those standards, I will argue that the theory of syntax that underlies the Ergo Linguistic Technologies' parser is the best theory of syntax and that all others should be relegated to the scrap heap of "wannabe" theories until such time as they can produce equal or better parsers. The logic that I will present to support this is: 1) if there is ever to be a way to determine which of the competing, extant theories of syntax is preferable to the others, there must be an independent and objective means of weighing the relative value and completeness of these theories in terms of their ability to accomplish the tasks they were originally designed for. Specifically, there must be an independent and objective means of verifying which theories are indeed most capable of expressing all and only those generalizations about language that describe and explain the observed facts of their structure. 2) since computers have the ability to represent and execute binary algorithms, any theory that is composed of binary algorithms should be able to be implemented in a programming language. Thus, any theory of syntax that has reached a level of maturity should be able to represent its generalizations in working parsers. In fact all programming languages and compilers are based on early syntactic discoveries like phrase structure rules and Noam Chomsky is the default reference for much of the early work, and have already demonstrated their aptness for this sort of comparison. 3)the degree to which a theory of syntax and its algorithms cannot be implemented in a programming language is the degree to which that theory and its algorithms have not been completely or correctly worked out and should not be considered a mature enough theory to be included in the discussion of which theory is to be preferred. 4)the theory which is most thoroughly worked out will naturally have the most thorough and comprehensive parsing programs associated with it, and for that reason is to be considered the best theory of syntax as determined by this independent, objective criteria. I will also propose a method for judging which theories have been "best" implemented in a programming language. Specifically, I will argue that the standards described below are the minimum standards that a theory of syntax would have to parse in order to be able say that it had reached some level of maturity and also this same set of criteria would be used to determine exactly which theories of syntax had most effectively accomplished the task of modeling the mechanisms that generate all and only the sentences of a language. In addition, the comparison of individual parses will of course use the Penn Treebank II guidelines established by the Linguistic Data Consortium at the University of Pennsylvania. Of course, any theory of syntax, whatever its assumptions and methods, should be able to translate its structures into the Penn Treebank style if their work is thorough and complete. The ability to generate these labeled brackets and trees in itself constitutes a good test of a theories maturity. The motivation for such comparisons and standards is of course to provide an independent and objective means of evaluation of the merits and relative success of research in this area that can be judged and discussed not only by those with a particular theoretical orientation, but also by those with different theoretical backgrounds, those in different areas of linguistics, and of course those from fields outside of linguistics who need to evaluate and discuss such materials. THE STANDARDS: In addition to using the Penn Treebank II guidelines for the generation of trees and labeled brackets and a dictionary that is at least 35,000 words in size and works in real time and handles sentences up to 15 to 20 words in length, we suggest that NLP parsers should also meet standards in the following seven areas before being considered "complete." The seven areas are: 1) the structural analysis of strings, 2) the evaluation of acceptable strings, 3) the manipulation of strings, 4) question/answer, statement/response repartee, 5) command and control, 6) the recognition of the essential identity of ambiguous structures, and 7) lexicography. (These same criteria have been proposed for the coordination of animations with NLP with the Virtual Reality Modeling Language Consortium--a consortium (whose standards were recently accepted by the ISO) designed to standardize 3D environments. (See http://www.vrml.org/WorkingGroups/NLP- ANIM). It is important to recognize that EAGLES and the MUC conferences, groups that are charged with the responsibility of developing standards for NLP do not mention any of the following criteria and instead limit themselves to largely general characteristics of user acceptance or vague categories such as "rejects ungrammatical input" rather than specific proposals detailed in terms of syntactic and grammatical structures and functions that are to be rejected or accepted. The EAGLES site is made up of hundreds of pages of introductory material that is very confusing and difficult to navigate; however, once you actually find the few standards that are being proposed you will find that they do not come close to the level of precision and depth that is being proposed here and for that reason should be rejected until such time as these higher and more demanding levels of expectation of the NLP systems is included there as well. These are serious matters and a group like EAGLES should not ignore extant NLP tools simply because they are not mainstream or because mainstream parsers cannot meet these requirements (evnthough the Ergo parser is better known than almost all other parsers). Just go through their pages and try to find EXACTLY what a parser is expected to do under these guidelines. There is almost no reference to specific grammatical structures, the Penn Treebank II guidelines, or references to current working parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html). If the EAGLES' standards are ever to gain any credibility and respect they are going to have to be far more specific about grammatical and syntactic phenomena that a system can and cannot support. There should also be some requirement that the systems being judged offer a demonstration of their abilities to generate labeled brackets and trees in the style of the Penn Treebank II guidelines. I suggest the following as a far more exacting and far more demanding test of systems than is offered by EAGLES or any of the MUC conferences. HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS: 1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts of sentence, 3) identify internal clauses (what they are and what their role in the sentence is as well as the parts of speech, parts of sentence and so on of these internal clauses), 4) identify sentence type (without using punctuation), 5) identify tense and voice in main and internal clauses, and 6) do 1-5 for internal clauses. 2. At a minimum from the point of view of EVALUATION OF STRINGS, the parser should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3) give the number of correct parses identified, 4) identify what sort of items succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the number of unacceptable parses that were tried, and 6) give the exact time of the parse in seconds. 3. At a minimum, from the point of view of MANIPULATION OF STRINGS, the parser should: 1) change yes/no and information questions to statements and statements to yes/no and information questions, 2) change actives to passives in statements and questions and change passives to actives in statements and questions, and 3) change tense in statements and questions. 4. At a minimum, based on the above basic set of abilities, any such device should also, from the point of view of QUESTION/ANSWER, STATEMENT/RESPONSE REPARTEE, he parser should: 1) identify whether a string is a yes/no question, wh-word question, command or statement, 2) identify tense (and recognize which tenses would provide appropriate responses, 3) identify relevant parts of sentence in the question or statement and match them with the needed relevant parts in text or databases, 4) return the appropriate response as well as any sound or graphics or other files that are associated with it, and 5) recognize the essential identity between structurally ambiguous sentences (e.g. recognize that either "John was arrested by the police" or "The police arrested John" are appropriate responses to either, "Was John arrested (by the police)" or "Did the police arrest John?"). 5. At a minimum from the point of view of RECOGNITION OF THE ESSENTIAL IDENTITY OF AMBIGUOUS STRUCTURES, the parser should recognize and associate structures such as the following: 1) existential "there" sentences with their non-there counterparts (e.g. "There is a dog on the porch," "A dog is on the porch"), 2) passives and actives, 3) questions and related statements (e.g. "What did John give Mary" can be identified with "John gave Mary a book."), 4) Possessives should be recognized in three forms, "John's house is big," "The house of John is big," "The house that John has is big," 5) heads of phrases should be recognized as the same in non-modified and modified versions ("the tall thin man in the office," "the man in the office," the tall man in the office" and the tall thin man in the office" should be recognized as referring to the same man (assuming the text does not include a discussion of another, "short man" or "fat man" in which case the parser should request further information when asked simply about "the man")), and 6) others to be decided by the group. 6. At a minimum from the point of view of COMMAND AND CONTROL, the parser should: 1) recognize commands, 2) recognize the difference between commands for the operating system and commands for characters or objects, and 3) recognize the relevant parts of the commands in order to respond appropriately. 7. At a minimum from the point of view of LEXICOGRAPHY, the parser should: 1) have a minimum of 50,000 words, 2) recognize single and multi-word lexical items, 3) recognize a variety of grammatical features such as singular/plural, person, and so on, 4) recognize a variety of semantic features such as +/-human, +/-jewelry and so on, 5) have tools that facilitate the addition and deletion of lexical entries, 6) have a core vocabulary that is suitable to a wide variety of applications, 7) be extensible to 75,000 words for more complex applications, and 8) be able to mark and link synonyms. THE CONCLUSIONS I WILL DRAW FROM THIS ARE: 1) the theory that underlies the software at Ergo Linguistic Technologies is not only the best theory of syntax, but is the ONLY theory of syntax that has reached a sufficiently developed state to even attempt the standards described here. 2) those who do not mention this theory in their research proposals, grant applications, publications and so on are guilty of negligence (and could be sued if there are grants, contracts, jobs, or other such items of material value at stake and where the offerer of these jobs, grants, etc has reason to expect that the applicant is an expert in his field and is providing an accurate picture of the competitive environment). In addition, computational linguistics departments who do not mention these tools or use tools of this calibre are remiss in their duty to present the full range of available materials to their students. 3) All current theories of syntax such as Chomsky's latest or even older versions of his theory HPSG, LFG, etc. should all be relegated to the scrap heap of "wannabe" systems until such time as they have been worked out in sufficient detail to allow the creation of programs that can execute their algorithms to the degree required by the above standards. (I do not want to imply that the use of these theories to analyze the worlds' languages cannot or has not contributed greatly to the store of knowledge about the nature of the world's langauges. As a matter of fact the theory that we are working with owes a tremendous debt to all the work that has come before it in the form of these earlier theories. The only problem is that these other theories have not yet completed their basic research and have not yet reached a level of sufficient maturity to work with the standards described above and for that reason can only be considered works in progress or "wannabe" theories.) I will finish my UH talk with a demonstration of the software that has been developed from our theory of syntax focusing on demonstrations from the seven standards described above and handouts from the output of other parsers. In addition to our standard demo as seen on our web site http://www.ergo-ling.com), I will use the tools called "The BracketDoctor" (a device that generates labeled brackets and trees in the style of the Penn Treebank II guidelines) "The English Sentence Enhancer" (an ESL grammar checker) "The Logic Doctor" (a program that handles first order predicate calculus, syllogistic reasoning, inferrencing and basic logic) and "The Q&A Demo" ( a program that shows our ability to handle question/answer, statement/response repartee) to demonstrate our strengths using the Penn Treebank II style trees and labeled brackets as well as practical illustrations to demonstrate the abilities of our theory of syntax in those seven areas. (All these tools except the "Logic Doctor" and the "Q&A Demo" are available for free download from our web site at http://www.ergo-ling.com or by email by writing me at bralich
hawaii.edu. These are Windows 95 programs that fit on one disk and can be installed with a standard setup function from WIN95.) Please be advised that these programs are copyrighted and patent pending. In sum, I would like to know of references and to receive comments in support of or against the following argument: 1) that computers are the ideal devices for comparing different theories' abilities to model the phenomena they seek to describe (all and only the grammatical sentences of a languga); 2) that any theory that can not be fully implemented in a programming language as described in the standards outlined above, is flawed in some way; and 3) that the best independent and objective measure of a theories scope, efficiency, and effectiveness is the degree to which it can be implemented in a programming language. (Of course, the basis for judgement will be the Penn Treebank II guidelines and the standards described above). Then based on the ability of the Ergo Linguistic's tools to compete in all the standards, I suggest that the theories of Brame, Chomsky, Kaplan and Bresnen, Pollard and Sag, Starosta, et al be set aside until such time as they can be shown to generate programs that are as good or better than those produced at Ergo Linguistic Technologies' offices. Phil Bralich P.S. We recommend that you download these tools and take them with you (on a lap top is best of course) to any linguistics, NLP, Computational Linguistics, MT, or logic conference or workshop that will discuss work in these areas. It should provide you with an interesting source of comparison material as well as with some interesting and challenging questions for the presenters. Of course, this may also be of value for students in their classes. Linguistics and Computer Science departments that are currently not committed to any particular theory of syntax or approach might want to consider collaborative involvements with this theory as a means of producing commercially viable products and as a source of research grants. You may also wish to compare results in published reports with results that these tools provide. You may also want to email copies of one or more of these tools to classmates, teachers, and co-workers (please avoid sending them to competitors like a big bunch of unordered pizzas). P.P.S. As the field of linguistics is dominated by very intelligent, very informed individuals who are also quite competitive, you can measure the success of this argument on the field overall by the reactions of the readers to this post--the smaller the response, the higher the acceptance (begrudging though it may be). That is, people are certainly willing to criticize any argument they can, but they merely keep quiet if they cannot. Praise for a competitor's arguments is not likely. Thus, a lack of criticism should be interpreted as acceptance of these arguments. Philip A. Bralich, President Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 tel:(808)539-3920 fax:(880)539-3924