Editor for this issue: James German <james
linguistlist.org>
Philip A. Bralich wrote: > Thus, a lack of criticism should be interpreted as acceptance of these arguments. Here it is: 1) Bralich's first mail is an ad for the products of his company to 80%. The follow-up mails were ads to 90%-100%. 2) Bralich's view is anglo-centristic. 3) Bralich's view on syntax is related to some historically grown traditions that are not necessarily right. 4) Bralich is listing the properties of his system and try to define those as the standard. 5) Bralich does not contribute to the field. 1) and 4) are obvious. I try to explain 2), 3) and 5) a little bit further. > Of course, any theory of syntax, whatever its assumptions and > methods, should be able to translate its structures into the Penn > Treebank style if their work is thorough and complete. The ability > to generate these labeled brackets and trees in itself constitutes a > good test of a theories maturity. This statement proves the anglo-centristic viewpoint of the author. What about Russian or Chinese? What about theories that describe fragments of other languages than English. Are the produced structures supposed to be mapped to the Penn Treebank? And who dares to say that trees are the appropriate structure for language? > THE STANDARDS: In addition to using the Penn Treebank II guidelines > for the generation of trees and labeled brackets and a dictionary > that is at least 35,000 words in size and works in real time and > handles sentences up to 15 to 20 words in length, we suggest that > NLP parsers should also meet standards in the following seven areas > before being considered "complete." The seven areas are: 1) the > structural analysis of strings, 2) the evaluation of acceptable > strings, 3) the manipulation of strings, 4) question/answer, > statement/response repartee, 5) command and control, 6) the > recognition of the essential identity of ambiguous structures, and > 7) lexicography. The construction and maintenance of a lexicon is a costly process. So does Bralich claim that a syntactic theory the proponents of which cannot afford to build a system with at least 35,000 lexical entries is a bad theory? This question was raised by John Phillips (9.276) and Bralich answered that one can buy a lexicon from the Linguistic Data Consortium. Again what about other languages than English? > It is important to recognize that EAGLES and the MUC conferences, > groups that are charged with the responsibility of developing > standards for NLP do not mention any of the following criteria and > instead limit themselves to largely general characteristics of user > acceptance or vague categories such as "rejects ungrammatical input" > rather than specific proposals detailed in terms of syntactic and > grammatical structures and functions that are to be rejected or > accepted. > There is almost no reference to specific grammatical structures, the > Penn Treebank II guidelines, or references to current working > parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html). Why should there be? The purpose of a standard is to fix certain things that a group of people agrees about. Is the right structure that should be assigned to sentences agreed upon yet? So it is rather the other way round: A standardization of grammatical structures at the present time would be a big mistake. And: What would be the prediction that the Tree Bank makes regarding Chinese sentences? 5) The main problem that I had with Bralich's posting is that it does not contribute anything to the field of linguistics or computational linguistics. His company has implemented a program that can perform certain tasks. This is a good and an interesting thing. But what does this tell the scientific community? In the last century there was a machine that could play chess. It turned out that inside of the machine there was sitting a little dwarf making reasonable moves. Bralich can argue that there is no dwarf in his laptop. Okay, there is another example. In this century a machine was build that really could play chess. It even defeated the champion of the world. But what does this tell us about the nature of chess? What did the moves the machine made tell us about the algorithms used? Nothing. And one problem of the competition has been that it was not possible to study the behavior of the machine. The machine is top secret. So what is the point of somebody turning up at a conference with a black box saying this program can handle phenomenon XY? How do we know that this program handles phenomenon AB as well? Shall we sit there and try all sentences with Bralich's program to guess what the theory behind this program is? Following Bralich's logic there is no underlying linguistic theory because if there would be, a documentation of it could be found on their web pages. Unless he does not reveal his assumptions about linguistic theory his mails have to be regarded as pure advertisements of his products and should be banned from the Linguist Mailing List. Stefan Mueller - Language Technology Lab DFKI GmbH Tel.: (+49 - 681) 302 - 5295 Stuhlsatzenhausweg 3 Fax: (+49 - 681) 302 - 5338 D-66123 Saarbruecken http://www.dfki.de/~stefan/ http://www.dfki.de/~stefan/Babel/Interaktiv/Babajava/Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
At 09:30 PM 2/25/98 +0000, Phil Bralich wrote: >You are obviously not reading the standards. Take a look at them >closely (appended below for convenience). They are very simple and >you will note that most people have assumed that that much at least >had already been accomplished by those theories and yet, in spite of >the fact that there is nothing that prevents their formalisms from >being programmed they cannot produce parsers that meet even those >minimal standards. [much snipped] List of standards: >1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF >STRINGS, the parser should:, 1) identify parts of speech, 2) identify >parts of sentence, 3) identify internal clauses (what they are and >what their role in the sentence is as well as the parts of speech, >parts of sentence and so on of these internal clauses), 4) identify >sentence type (without using punctuation), 5) identify tense and voice >in main and internal clauses, and 6) do 1-5 for internal clauses. I strongly suspect (and in this I agree with Steven Spackman, in the same post) that machine implementation of such elementary analyses as _4) sentence type_ are beyond the range of ability of any parser that does not share a knowledge of human culture sufficient to include complex pragmatic analyses. I base this on examples like the following (which I owe to a brilliant talk by the late Dwight Bolinger at a CLS some years ago): 1) \I'm sure. (single falling tone) 2) \You're sure. (single falling tone) As Bolinger pointed out, 1) is a statement, while 2) is a question. His point was that intonation is only loosely coupled to syntax, and that sentence type is equally so. Only some entity with the knowledge of pragmatics that human beings have could keep track of this stuff. It well may be that a parsing program could recognize the syntactic import of utterances at the level at which people interact with computers (`Open pod bay six, HAL'), but such utterances are a small subset of overall language, and therefore it is unlikely that being able to implement a particular parser will be a test of <italic>real</italic> language parsing. Geoff Nathan Geoffrey S. Nathan Department of Linguistics Southern Illinois University at Carbondale, Carbondale, IL, 62901 USA Phone: +618 453-3421 (Office) FAX +618 453-6527 +618 549-0106 (Home)Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
There have been several more reactions to my post concerning whether
or not the ability of a theory of syntax to be implemented in a
programming langauge constitutes a fair and accurate indepenend and
objective test of a theories scope and efficiency. In order to save
bandwidth I will respond one last time to this thread and try and
cover the widest range of crticisms as possible.
I am sorry to be the one to have to bring to you news of a serious
problem in your field, but the fact remains that the theories that you
have grown to know and love over the last 30+ years have a dirty
little secret: They cannot be programmed to save their lives. This
thread has taken on more of a life than I expected, so if all are
agreed I will make this the last post for this particular thread
(though not this subject I am sure). Please do not see this as an
opportunity to let your venom fly as I will respond to posts that I
feel must be responded to. I think it is easiest to frame this in
terms of arguments that are "out there" and my responses to them.
The garden path arguments and my responses:
1. The standards I have proposed have already been met. (They have
not). Not by a long shot. Just print out the standards, put a copy
of Ergo software in your pocket and then go and compare them with any
parsing system anywhere.
2. The standards I propose are idiosyncratic to Ergo's theory or they
are somehow unfair. Look at them yourself and ask if you and most of
the field hasn't believed they are commonplace expectations for any
theory or any parser.
3. Current problems with NLP have to do with working with the last
10%. That is, the pretense is they can already handle 90% of what
needs to be done but more is required. This is dead wrong. Parsers
outside of Ergo hardly begin to touch the standards we have proposed:
few of them doing anything more than part of speech analysis. If you
look at the output on speech rec systems you will see their NLP
abilities are well under 1% of the task (handling only a few hundred
commands). Ergo can improve that by another 60-80% increasing the
number of possible commands to many thousands, making the first spoken
language operating systems possible.
3. Parsing is not a good test of a theory even though there has never
been a theoretical mechanism proposed thatin principle could not be
programmed. Note that other NLP researchers are not anxious to argue
that their theories are better BECAUSE they cannot be programmed.
That would end virtually any hope of funding that may exist for them
in the NLP arena. Thus, I believe it is safe to say that all other
syntactic theoreticians agree wholeheartedly that programming is a
good test of a theory. I have yet to see one theoretical syntactician
to argue this claim. Though it does seem that there are those in the
field who believe parsing is not a good test. (Statisticians
probably--the last thing they would want is for a theory of syntax to
do better than their number crunching). Perhaps syntacticians with
other theories would like to take up the debate. Would a theory of
math that could not be programmed to make calculators then be a better
theory of math because it was using less mundane criteria than formal
consistency?
4. Statistics alone is sufficient to analyze the facts of human
language:
Wrong: statistics will never provide sufficient information about the
internal structure of strings to manipulate structures or to do
question/answer, statement/response repartee. (Aside: Does a vote for
Ergo equal a vote against statistics? Perhaps.)
5. People will not accept NLP until disfluencies and other gaps are
handled.
This is more than a little bizarre. By this logic speech recognition
should have sold nothing to date and even current products should be
stamped as not fit for human consumption. Believe me, when you can
type or speak the following to your search engine, people will forget
about the disfluencies and gaps.
Who was the eighth President of the United States?
Hey Mickey, what time is it?
6. Parsers are too cumbersome to be made readily available to the
general public. Again not true: Ours is a standard Windows 95 program
that fits on one disk (including the 75,000 word dictionary) and will
run on any 486 or better PC. If it is NOT superior to the others they
should be able to do the same.
7. There is something inherently wrong with the Penn Treebank
standard.
Doesn't matter: it is a true demonstration of a parsers ability to do
part of speech tagging as well as to do a thorough analysis of
internal structure. If this is done it Shouldn't take more than a few
weeks for the programmers to convert their Parser's output into the
Penn Treebank style. That is just not a big programming task.
Besides the Penn Treebank II guidelines are the standards accepted by
this field. (Of course, we also need equivalent standards for other
languages.)
8. Changing one structure into another or doing q&a makes untenable
theoretical claims about the relationships between structures. Again
not so--if you have properly analyzed the internal structure of
strings you should be able to change a question to a statement and a
statement to a question whether or not you believe this is what goes
on in the brain. The structures are so totally predictable, one from
the other, that this too should only take a programmer a week or so
(if the analysis of internal structure has been done correctly in the
first place).
9. People could respond intelligently to my claims, they are just too
busy with other things or too put off by my arrogance (accuracy?).
Wrong: this is a written record respected in the community and as
available as a library book (just type my name in a Net Search if you
want to find these arguments): not to respond is to acquiesce.
There is still a serious problem underlying the lack of response from
people who know this field, For syntacticians, if they say that
theories can be tested by their ability to be implemented as a parser
they have to produce a parser of at least equal uality to the Ergo
parser or concede ours is best; however, if they say that there are
more important issues than parsing (thereby demonstrating their theory
CANNOT be implemented in a parser) they must forever write off funds
for parsing until such time as they have amended their theory or their
opinion.
For statisticians, if they say that a theory of syntax can be parsed
at all, they are in danger of admitting there is no particular need
for statistical parsers. If they say that theories of syntax cannot
create parsers or cannot create parsers equal to statistical parsers
they must come up with a statistical parser that can meet or beat
those very ordinary standards that I have proposed. This is
especially difficult for them because there is no way that a
statistical parser will ever analyze internal structure to a
significant enough degree to do q&a or manipulate structures
(otherwise they would have developed a theory of syntax and would once
again remove the need for statistical parsers).
Finally, download a BracketDoctor (perhaps these arguments as wel),
take it to classes or to presentations or to conferences, and ask
questions based on what it can do. If you are given straight answers
with evidence of better results from other parsers you will KNOW I am
wrong. If anything else occurs (e.g. dead silence, dirty looks,
accusations of political incorrectness, shunning, or whatever) you
know there is substance in my arguments. Gauge my arguments not by
the intellectualized cloudiness of responses, but by the lack or
presence of physical evidence (don't go by oral reports alone) from
other parsers that can meet the standards I have provided. I have
provided very ordinary standards (repeated below) such that anyone
should be able to judge this. Look closely at the standards; you will
see they are fair and relatively simple. Then, BracketDoctor and
arguments in hand, go out and find the physical evidence yourself.
Phil Bralich
THE STANDARDS: In addition to using the Penn Treebank II guidelines
for the generation of trees and labeled brackets and a dictionary that
is at least 35,000 words in size and works in real time and handles
sentences up to 15 to 20 words in length, we suggest that NLP parsers
should also meet standards in the following seven areas before being
considered "complete." The seven areas are: 1) the structural analysis
of strings, 2) the evaluation of acceptable strings, 3) the
manipulation of strings, 4) question/answer, statement/response
repartee, 5) command and control, 6) the recognition of the essential
identity of ambiguous structures, and 7) lexicography. (These same
criteria have been proposed for the coordination of animations with
NLP with the Virtual Reality Modeling Language Consortium--a
consortium (whose standards were recently accepted by the ISO)
designed to standardize 3D environments. (See
http://www.vrml.org/WorkingGroups/NLP- ANIM).
It is important to recognize that EAGLES and the MUC conferences,
groups that are charged with the responsibility of developing
standards for NLP do not mention any of the following criteria and
instead limit themselves to largely general characteristics of user
acceptance or vague categories such as "rejects ungrammatical input"
rather than specific proposals detailed in terms of syntactic and
grammatical structures and functions that are to be rejected or
accepted. The EAGLES site is made up of hundreds of pages of
introductory material that is very confusing and difficult to
navigate; however, once you actually find the few standards that are
being proposed you will find that they do not come close to the level
of precision and depth that is being proposed here and for that reason
should be rejected until such time as these higher and more demanding
levels of expectation of the NLP systems is included there as well.
These are serious matters and a group like EAGLES should not ignore
extant NLP tools simply because they are not mainstream or because
mainstream parsers cannot meet these requirements (evnthough the Ergo
parser is better known than almost all other parsers). Just go
through their pages and try to find EXACTLY what a parser is expected
to do under these guidelines. There is almost no reference to
specific grammatical structures, the Penn Treebank II guidelines, or
references to current working parsers as models
(http://www.ilc.pi.cnr.it/EAGLES/home.html).
If the EAGLES' standards are ever to gain any credibility and respect
they are going to have to be far more specific about grammatical and
syntactic phenomena that a system can and cannot support. There
should also be some requirement that the systems being judged offer a
demonstration of their abilities to generate labeled brackets and
trees in the style of the Penn Treebank II guidelines. I suggest the
following as a far more exacting and far more demanding test of
systems than is offered by EAGLES or any of the MUC conferences.
HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS: 1. At
a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
STRINGS, the parser should:, 1) identify parts of speech, 2) identify
parts of sentence, 3) identify internal clauses (what they are and
what their role in the sentence is as well as the parts of speech,
parts of sentence and so on of these internal clauses), 4) identify
sentence type (without using punctuation), 5) identify tense and voice
in main and internal clauses, and 6) do 1-5 for internal clauses.
2. At a minimum from the point of view of EVALUATION OF STRINGS, the
parser should: 1) recognize acceptable strings, 2) reject unacceptable
strings, 3) give the number of correct parses identified, 4) identify
what sort of items succeeded (e.g. sentences, noun phrases, adjective
phrases, etc), 5) give the number of unacceptable parses that were
tried, and 6) give the exact time of the parse in seconds.
3. At a minimum, from the point of view of MANIPULATION OF STRINGS,
the parser should: 1) change yes/no and information questions to
statements and statements to yes/no and information questions, 2)
change actives to passives in statements and questions and change
passives to actives in statements and questions, and 3) change tense
in statements and questions.
4. At a minimum, based on the above basic set of abilities, any such
device should also, from the point of view of QUESTION/ANSWER,
STATEMENT/RESPONSE REPARTEE, he parser should: 1) identify whether a
string is a yes/no question, wh-word question, command or statement,
2) identify tense (and recognize which tenses would provide
appropriate responses, 3) identify relevant parts of sentence in the
question or statement and match them with the needed relevant parts in
text or databases, 4) return the appropriate response as well as any
sound or graphics or other files that are associated with it, and 5)
recognize the essential identity between structurally ambiguous
sentences (e.g. recognize that either "John was arrested by the
police" or "The police arrested John" are appropriate responses to
either, "Was John arrested (by the police)" or "Did the police arrest
John?").
5. At a minimum from the point of view of RECOGNITION OF THE
ESSENTIAL IDENTITY OF AMBIGUOUS STRUCTURES, the parser should
recognize and associate structures such as the following: 1)
existential "there" sentences with their non-there counterparts
(e.g. "There is a dog on the porch," "A dog is on the porch"), 2)
passives and actives, 3) questions and related statements (e.g. "What
did John give Mary" can be identified with "John gave Mary a book."),
4) Possessives should be recognized in three forms, "John's house is
big," "The house of John is big," "The house that John has is big," 5)
heads of phrases should be recognized as the same in non-modified and
modified versions ("the tall thin man in the office," "the man in the
office," the tall man in the office" and the tall thin man in the
office" should be recognized as referring to the same man (assuming
the text does not include a discussion of another, "short man" or "fat
man" in which case the parser should request further information when
asked simply about "the man")), and 6) others to be decided by the
group.
6. At a minimum from the point of view of COMMAND AND CONTROL, the
parser should: 1) recognize commands, 2) recognize the difference
between commands for the operating system and commands for characters
or objects, and 3) recognize the relevant parts of the commands in
order to respond appropriately.
7. At a minimum from the point of view of LEXICOGRAPHY, the parser
should: 1) have a minimum of 50,000 words, 2) recognize single and
multi-word lexical items, 3) recognize a variety of grammatical
features such as singular/plural, person, and so on, 4) recognize a
variety of semantic features such as +/-human, +/-jewelry and so on,
5) have tools that facilitate the addition and deletion of lexical
entries, 6) have a core vocabulary that is suitable to a wide variety
of applications, 7) be extensible to 75,000 words for more complex
applications, and 8) be able to mark and link synonyms.
Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822
Tel: (808)539-3920
Fax: (808)539-3924
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
I apologize if others have already made these points---perhaps I've come to the NLP/syntax discussion mid-stream. Given my perspective as a generative linguist, I would like to suggest that Philip Bralich add a couple of items to his list of NLP parser "tests"; such additions are related to various points about what should constitute the "best" or most "mature" theory of syntax. 1.) A parser must hypothesize structural representations word-by-word, i.e. the parser cannot wait until all words have been encountered in order to assign parts of speech, assign brackets/constituent structure, anticipate the correct analysis of potential ambiguities (or any of the other tasks Bralich claims that a parser should do). 2.) As a corollary to (1), a parser should be "garden pathed" by the same sentences which garden path the Natural Language Processors it is meant to model (i.e. humans); furthermore, the parser should be able to re-parse those sentences which humans can re-parse, and the parser should utterly fail to re-parse the garden path sentences which are---though technically grammatical---totally opaque to humans' re-parsing. It should also be the case that the parser accepts those ungrammatical sentences which humans seem to "parse" just fine even though the sentence is ungrammatical, e.g. (the classic example) "More people have been to Paris than I have"--- nothing about that sentence is glaringly unparsable, but when you stop to think about what it means, it makes no sense. 3.) The structure of the parser, particularly in the way it encodes the structures/mechanisms of a given syntactic theory, should provide an explanation as to why certain sentences are difficult for humans to parse and others---though superficially more "complex"---are easy to parse. That last point constitutes the connection to syntactic theory, otherwise the parser is nothing more than a technical re-description of the problem. Allow me to offer the following additions to Bralich's notion of a "mature" theory. A "mature" theory of syntax should have within its mechanisms an inherent capacity to: 4.) provide a principled account of the range of variation in human language (where "principled" is taken to mean that you don't simply code variation in your programs, but rather the range of variation follows as a natural consequence from the interactions of the mechanisms of the theory); 5.) provide a principled account of the natural acquisition of grammar; (for example, why is it that children learning Italian (a language with a full verbal inflectional paradigm) acquire subject-verb inversion earlier than children learning English (with its impoverished inflectional paradigm) acquire subject-auxiliary inversion;) 6.) provide a principled account of language change---especially the means to delineate in a principled way those changes which are sociological in nature from those which are fundamental changes in the grammars of the speakers; (for example, the theory should be able to explain why "Went John to London?" was replaced by "Did John go to London?", and as a consequence of explaining that change, the theory should also make clear whether the chronologically parallel loss of quasi-double object constructions like "Mary gave to John a book" was purely coincidental or was actually another surface manifestation of a core change in the grammars;) 7.) make explicit the relationship between the issues in (5) and (6), i.e. to the extent that a given theory of syntax provides an accurate model of acquisition, the theory should thereby inherently provide a model for how the recursive process of acquisition generates specific changes in grammatical systems. And finally, as an addendum to the specific metric Bralich defines for evaluating syntactic theories (i.e. that they must be encodable in a computer language), let's further stipulate that any implementation of a syntactic theory in a computer language should be breakable in ways which produce the same varieties of dysfunction which we see in aphasia. A tall order? No doubt. But here's the point (in case the implicitness of my commentary has too thoroughly masked the message for any one given person): proclaiming that a theory should do X, Y, and Z, and then concluding that your theory is best because it does X, Y, and Z better than anyone else's theory is, well (how to put this....?), limited. Frankly, hats off to Philip Bralich for creating a software application which can do everything he says it can. (I, for one, have about as much spare time as it takes to write this note; testing his claims doesn't make the cut for things I make time for. (So why have I taken the time to write this note? Because independently of the accuracy of Bralich's specific claims, there's a much bigger issue here---)) So hats off to the successes, but let's be clear about how a research program unfolds: if "ability to be encoded in a computer language" is one of the rubrics you use in developing your theory, then in all likelihood you will make decisions about any number of various components of the theory which are based, at least in part, on the inherent structures and mechanisms which define computer systems. If what you want out of your theory is something that will give you an edge in a multi-billion dollar industry, then it doesn't matter if your syntactic theory and resultant parser make use of computer structures/mechanisms which humans don't possess. But if you want a theory which provides insights into the mechanisms of Natural Language Processing in the sense that your modeling how humans being do it, then it seems to me that computers can provide a magnificently sophisticated and powerful blackboard which we can use in the exploration of theoretical issues. Crucially, such explorations can---and do---happen even though the particular theory in question is not able to be encoded in such a way that it produces a parser which meets Bralich's specs. The bottom line is that it's a mistake of scope to equate "encodability in a computer language and the computerized parser that can produce" with "the best theory of syntax". Respectfully, Mark ArnoldMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue