LINGUIST List 9.413

Thu Mar 19 1998

Disc: NLP and Syntax

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. Philip A. Bralich, Ph.D., Re: 9.395, Disc: NLP and Syntax
  2. Philip A. Bralich, Ph.D., Re: 9.396, Disc: NLP and Syntax

Message 1: Re: 9.395, Disc: NLP and Syntax

Date: Thu, 19 Mar 1998 11:17:33 -1000
From: Philip A. Bralich, Ph.D. <bralichhawaii.edu>
Subject: Re: 9.395, Disc: NLP and Syntax

At 12:41 PM 3/18/98 -1000, The Samuel L. Bayer wrote:

>> Here you are talking about doing something useful with huge numbers
>> of documents of unrestricted text whereas I am speaking (primarily)
>> about doing question/answer, statement/response repartee, grammar
>> checking, the improvemengt of machine translation devices and of
>> course about a significant, over night improvement in naviagation
>> and control devices. Nothing in the MUC standards speaks to any of
>> this. The MUC standards are actually quite narrow compared to the
>> very wide realm of what is possible with NLP.
>
>> This problem is not unknown in the field. Take a look at what is
>> said in _The State of the Art in Human Language Technology_, Ron
>> Cole Editor-in-Chief. A 1996 report commissioned by the National
>> Science Foundation. In this report, Ted Briscoe (Section 3.7,
>> p. 1) states, "Despite over three decades of research effort, no
>> practical domain independent parser of unrestricted text has been
>> developed." In addition, in that same report, Hans Uszkoreit and
>> Anne Zaenen state (Section 3.1, p. 1), "Currently, no methods exist
>> for efficient distributed grammar engineering [parsing]. This
>> constitutes a serious bottleneck in the development of language
>> technology products."
>
>[ ... and finally, in response to someone else ... ]
>
>> However, I do not see how anyone could come even close to meeting
>> the standards I propose without first having a fully worked out
>> theory of syntax. Even if programmers were the ones to develop a
>> program that met those standards we would have to admit that
>> somewhere in those lines of code was a true theory of syntax. Even
>> if you just created a huge series of gerry rigs, either they would
>> not work or they would merge into a theory. The phenomena to be
>> described are complex, subtle, and intricate. Only a completely
>> worked out theory of syntax will result in such programs.


>First, I welcome Dr. Bralich's more subdued tone. But I still
>disagree with him, and I think that the final paragraph I've cited
>here illustrates the crux of our disagreement. I agree, of course,
>that there is a theory of language, no matter how simple and crude,
>embedded in ANY language-processing system. There used to be people
>in NLP who denied that, and claimed that their approaches were purely
>statistical, but I'm pretty sure most of those folks have finally
>admitted that such a goal is impossible (and counterproductive). And
>I certainly agree that the MUC criteria for evaluation bear on a tiny
>subset of the potential applications of language processing
>systems. Dr. Bralich lists a number of other potential applications,
>which I've repeated above. However, these observations do NOT imply,
>together, that Dr. Bralich's standards are very useful, and the
>problem is that Dr. Bralich cannot conceive of a situation in which
>his desiderata do not entail his conclusions.

I agree with the first part of this paragraph in that the research in
statistical analysis of langauge may indeed take us full circle and
point out that it is impossible to provide the NLP tools that are
required for this next age of computer interactivity without a fully
worked out, thoroughly efficient theory of syntax. This I think is
inevitable and just around the corner. I say this largely because of
the recent declarations of success for speech rec programs which now
changes the focus to what to do with the speech that has been
recognized. Certainly, 200 two or three word commands is not HAL, and
HAL is the goal. However, I do not completely agree with what is said
about the standards I have proposed. Let's continue with Dr. Bayer's
comments.

>For instance, Dr. Bralich lists question-answering as one of the
>potential applications which MUC does not address. This is
>true. However, the DARPA community HAS addressed question-answering
>systems in a related evaluation: the ATIS air travel reservation
>domain for spoken language understanding systems. In this evaluation,
>an attempt was made to define a syntactic/shallow semantic level of
>evaluation; the effort was called SEMEVAL, and it failed
>miserably. The problem was that there are too many different theories
>of language, and settling on an intermediate level of evaluation
>which embraced a particular one (and it was of course impossible to
>settle on such a level without embracing a particular one) was
>counterproductive and irrelevant, for a number of reasons:

First I would recommend that readers get a copy of the ATIS sentences
(I can provide one for those who are interested--if it doesn't violate
any copyrights), and then sit down with the ergo parser at
http://www.ergo-ling.com and see what you think of our ability to
provide q&a repartee with this genre of sentences.

>(1) the point was answering the question, not parsing the
>sentence. The intermediate representation was unimportant as an
>evaluation criterion.

This seems fine and the percentage of correct answers should form
a good criteria for judging success. 

>(2) there were many people who were using a simple frame-based
>approach which did not use an articulated intermediate
representation, >and they (rightly) observed that they would be
unfairly penalized for >using an alternative approach to reaching the
same goal.

Fair enough, so why not base the evaluation on practical results?

>(3) the range of syntactic constructions encountered in a corpus of
>14,000 spontaneous utterances on air travel is pretty large, and
>linguistic theories do not yet address the majority of them. In other
>words, not only was the proposed evaluation level unimportant and
>biased, it was also impossible to determine the "right" answer,
>because we don't know it yet.

For those who are seriously interested we have an in house demo called
"Q&A demo" which allows you to type in sentences (ATIS or otherwise)
and then ask questions of it and receive responses. I think this
particular application will give the reader a good sense of what I
would like to propose as a fair measure of a systems efficacy. Just
email me at "bralichhawaii.edu" and I'll get you one you can try. My
main point being that I wrote the standards not so much to slant
things toward particular theories or structural analyses but rather
toward practical results. From there I do indeed argue that only a
fully worked out theory of syntax would really have a chance of wide
success. However, if you look at the standards I believe you will see
that practical results really is the goal.

>All these objections apply to Dr. Bralich's proposed criteria. I
>sympathize tremendously with Dr. Bralich's goals. However, the
>problem of standards for language processing systems is far more
>complex than he's anticipating. There are three purposes one might
>conjure up for defining such standards: (a) it helps determine
>whether systems are behaving in linguistically well-justified ways
>(b) it allows one to compare systems (c) it contributes to the
>determination of whether these systems can contribute to tasks which
>require linguistic processing The problem is that (a) is ill-defined,
>and there's no evidence that it bears any relationship to (c), and
>only bears on (b) if you're trying to evaluate a system without a
>task in mind (which turns out to be nearly impossible). Dr. Bralich
>seems not to be able to imagine a scenario in which (a) and (c) do
>not entail each other, and my point is that the entire history of
>evaluation of language systems has failed to demonstrate that they
>are more than peripherally connected. [Just to clarify, I say (a) is
>ill-defined because theories currently vastly underanalyze the
>available data, and conflict on the data that they do analyze;
>resolving these conflicts MIGHT be useful for evaluation if (a) were
>to imply (c) and there was no other way to get to (c), but this has
>never been shown.]

I believe I am trying to demonstrate this through practical results.
The claim that I make that there must be a theory of syntax to achieve
the practical results suggested in the standards is a verifiable one:
just find a parser of anyother sort that does a better job than Ergo's
parser and I will have to back down.

>Now, like Dr. Bralich, I don't believe for a second that we can get
>100% of language analysis for ANY application without a detailed
>theory of syntax; but (a) I don't really care which one it is (well,
>I do, but not for the purposes of this discussion :-) ) and (b) we're
>nowhere near 100% analysis for any task (and demonstrating that
>Dr. Bralich's system matches the Penn Treebank at 100% accuracy
>indicates nothing relative to this goal), and committing to
>Dr. Bralich's criteria counterproductively biases the path that we
>take toward this goal.

But there is no requirement that anyone use a theory of syntax, it is
only suggested by me that that is where the strongest resulsts are
going to come from. The practical nature of the standards is neutral
as to what theory is chosen. The requirement that the parser generate
Penn Treebank trees is largely coming from the industry overall. I
have on most occassions that I have met with linguists in the industry
been asked to provide demonstrations using the ATIS sentences and the
Penn Treebank guidelines. We currently do that, and because it is
required by industry I have included them in the standard.

However, I do not see how one could be able to meet the other
standards I propose and NOT find it trivially easy to generate the
Penn Treebank style brackets and trees.

>By way of conclusion, let me elaborate on this final
>point. Tremendous progress has been made in computational linguistics
>over the last ten years by ABANDONING the commonly-held convictions
>in theoretical linguistics on how to make progress in this
>area. Theoretically well-motivated systems perform no better, and in
>many situations perform more poorly (both in speed and accuracy),
>than pragmatically-constructed systems which fundamentally change the
>assumptions about how language research is to be conducted. These
>systems have a theory of language behind them; it's just a sort of
>theory which theoretical linguists aren't very interested
>in. Dr. Bralich's standards impose a bias from theoretical
>linguistics on what systems MUST do in order to be successful; so
>far, this bias has been demonstrated by the computational linguistics
>community to be false and counterproductive. I personally think this
>result has vast implications for linguistic theory, and I would hate
>to see a set of standards adopted which effectively eliminated this
>alternative branch of investigation.

This may be true if you mean that the standards insist on using the
Penn Treebank styles. In that case, I think it reasonable to make
them an optional standard. However, the practical nature of all the
others satisfies Dr. Bayer'suggestions and avoids all his criticisms
for that reason, I think it is more than reasonable to suggest to
people in this area that Penn Treebank styles would be valuable but if
your system does not generate them, they can be excused from any
judging in that criteria. All the other standards will still be of
value whatever method were used to generate the practical results that
are described there.

Phil Bralich
>In vol-9-396, Mike_Maxwellsil.org wrote:
>In vol-9-383, Philip A. Bralich <bralichhawaii.edu> wrote:
>
>>...at some point we have to step back and ask ourselves which
>>theories, whatever their motivation are >capable of accounting for
>>the data. That is which theories after three decades of trying have
>>done the best job and how can we demonstrate that? ...Nor am I
>>saying they will not at some point arrive at a proper description of
>>the data that does indeed meet their goals (whether that be a
>>description of processing, acquistion, or whatever) but I am saying
>>that as long as they do not have a demonstrably satisfying account
>>of the basics, they can make no strong claim to being a mature or
>>effective theory. They can call themselves psychologically
>>motivated, or learning theory motivated or whatever, but they cannot
>>make many claims to being able to account for the data.
>
>[Bralich quotes Piu ten Hacken:]
>
>1. a linguistic theory which does not give a description of all the
>phenomenaBralich's parser covers >>need not be a bad theory;
>
>[Bralich again:]
>>Correct, but it is also not a mature theory.
>
>"The data" which a theory might claim to be able to "account for"
>encompasses a wide range of phenomena. Quite apart from the fact
>that Bralich has only claimed that his parser + grammar works for
>English, Chomsky outlined a long time ago (1965, in "Aspects of the
>Theory of Syntax"), three levels of adequacy for theories, namely
>observational, descriptive and explanatory. So far as I have seen in
>this discussion, Bralich is saying his parser + grammar does a good
>job at observational adequacy--this is what he appears to be calling
>"the basics"--but I have seen nothing about descriptive or
>explanatory adequacy.

I believe I am talking about both observational and descriptive
adequacy, but explanatory adequacy has not been discussed. I am
saying that the basics comprimises both observational and descriptive
adequacy without saying anything particular about explanatory while
other theories (like alchemists) are working on explanatory adequacy
without having a observationally or descriptively adequate account.

>Over the years, there have been many parser + grammar combinations
>that have achieved a reasonably high degree of observational
>adequacy. (The fact that they have not been put "in the reviewers
>hands", as Bralich says they should be, does not negate this claim.
>Many of these programs are proprietary to the companies that
>developed them, and they are not about to release them as stand-alone
>parsers.) But observational adequacy is not the only goal for most
>modern theories of linguistics, and a theory which achieved only
>observational adequacy--if there were such--cannot claim to be a
>mature theory, either.

THis simply is not borne out by the facts. Take a look at other
parsers yourself and you will see little more than incomprehensible
theory specific trees and no other output. And this for a very
limited number of structures. There is no demonstration of q&a no
ability to manipulate structures and no Penn Treebank output. But I
have said all this before, I want to remind the reader that these
discussions are quite vacuous if you do not take the time to look at a
few parsers yourself. For anyone who is interested I have a list of
url's I will be happy to send out. These include parsers at MIT,
Standford, Georgetown, Carnegie Melon and so on. I have about 25 on
the list that I distribute. Just contact me at "bralichhawaii.edu"
and I'll email it to you.

>So why not try to compare parsing programs on the level of
>descriptive adequacy? Because that presupposes we know what the
>correct analysis of every sentence in some large (and "complete")
>test corpus is, and we don't. That isn't to say people haven't tried
>to do such comparisons over the years, just that it has turned out to
>be more difficult than expected. While many parsing programs achieve
>reasonably similar levels of observational adequacy, i.e. they
>recognize more or less the same sets of sentences, they do not assign
>comparable constituent structures, and it is not apparent which
>constituent structure is correct. (Or worse for purposes of
>comparison, an LFG parser, say, will assign two different sorts of
>parallel structures where a GPSG parser will assign only one.)

This is precisely where the Ergo parser gets its advantage. 

>I could give a long listing here of syntactic constructions of
>English for which the appropriate structure is not apparent, but I'll
>content myself with one example: In the sentence "I wonder who came",
>is there a gap after "who"? (For that matter, is there a gap, in the
>sense of a phrasal node which does not dominate any terminals, in
>_any_ wh-construction?)

For the Penn Treebank guidelines there is a gap after "who" if
whatever theory you are using has done a complete analysis of the
string, it will be trivially easy to convert it into the Penn Treebank
style-- unless of course there has not been a thorough analysis of the
string.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 9.396, Disc: NLP and Syntax

Date: Thu, 19 Mar 1998 14:04:23 -1000
From: Philip A. Bralich, Ph.D. <bralichhawaii.edu>
Subject: Re: 9.396, Disc: NLP and Syntax

On Wed, 18 Mar 1998 13:50:06, Mark Johnson mj1lx.cog.brown.edu wrote:

>It would seem that Philip Bralich's parser is most closely related to
>broad-coverage parsers like FIDDITCH (developed by Don Hindle at the
>then Bell Labs), the LINK grammar developed at CMU and work by Steve
>Abney (now at AT&T Labs).

In some sense yes, except that these parsers do not cover the wide
range of NL functions that ours does. That is, they provide very
little or nothing in the practical areas of q&a, statement/response
repartee, the identification of synonymous structures, the ability to
change one structure into another and so on. That is, these parsers
offer little beyond the ability to do a part of speech analysis or in
some cases a constituent structure analysis--once they have these they
cannot go the next step to produce the practical applications that are
required such as increasing the number of commands in a speech rec
system, or adding q&a to a speech rec system or to a search engine for
large or small databases. Without these abilities these parsers have
no value outside of linguistics.

>In the admittedly small world of academic research on broad-coverage
>parsing, parser performance is usually evaluated by computing the
>average of the labelled precision and recall scores comparing the
>parser's output with the hand-constructed parse trees of a held-out
>section (usually section 23) of the Penn II WSJ corpus. This
>provides a standard quantitative score of parser performance. The
>authors above claim average precision and recall scores of around
>87%. I was wondering what average labelled precision and recall
>scores Philip Bralich's parser achieves?

I haven't run the parser across that section. I know we currently
parse 93%$ of the ATIS sentences; that is, we get a parse for about
91%. We get a correct parse for about 90% of that. This is based on
the analyses that we provide on our on-line demo (except that our in
house parsers are a little faster and cover a wider range of
phenomena).

We will be making a full-on effort with the WSJ corpus and other
corpii in the Penn Treebank collection as soon as our current projects
with the practical tools and structures I have described elsewhere is
complete. But truth be told we do very little with Wall Street
Journal Sentences as we believe there is an awful lot that needs to be
done within the standards we provide for the practical tools we
describe long before there is any clear reason to approach newspaper
English. For us that is like asking speech rec researchers to begin
their work with a legal debate among 8 angry lawyers from 8 different
dialect areas before they work out the smaller monolingual problems
first. You and anyone else is certainly welcome to criticize us on
this point, but until we have our parser working for navigation and
control, q&a, SQL and a few other areas, we will then look at
Newspaper English. We also want to increase our dictionary to 250,000
words before we get involved there. Our current 75,000 word dictionary
really is not up to the task of such analyses. If these other parsers
you mention have dictionaries of the size necessary to handle these
sentences they are to be applauded for that alone' however, we at Ergo
will wait until we can bring our full set of parsing tools to this
arena before we get into it. You will probably see posts on this in
early 99.

Phil Bralich


>Date: Wed, 18 Mar 1998 17:38:07 -0500 From: Dan Maxwell
><100101.2276compuserve.com> Subject: 9.368, Disc: NLP and Syntax I
>want to thank Sam Bayer for alerting us to the existence of detailed
>evaluation criteria for parsers. If every article in recent issues
>of CL not only refer to these criteria, but also tell us where they
>can be found found, then I agree with him in urging Phil Bralich to
>test his own parser against these criteria before making such strong
>claims. Since other parsers have already been tested against the MUC
>criteria, this appears to provide a reasonably objective basis for
>comparison.

I just want to remind the readers that the MUC criteria are for a very
limited NL domain, that of Information Retrieval and Information
Extraction (see http://cs.nyu.edu/cs/faculty/grishman/muc6.html) and
do not cover any of the very important very practical areas that are
described in the standards I propose. As a matter of fact I proposed
those standards precisely because there was this huge area of very
practical, very immediately needed NLP that simply was not being
addressed by the industry. Certainly, MUC makes no attempt to provide
standards in these areas: q&a, the recognition of synonymous
structures, the generation of Penn Treebank sentences, and the ability
to manipulate structures. All of which are crucial for the
development of the sort of NL tools that are required for speech rec
devices and for data base searching and so on.

>For all I know, the Penn Treebank guidelines may be just as good or
>better in some sense (it's hard to tell for sure), but I haven't yet
>heard that they provide a detailed grading system like the MUC
>system, nor that other parsers have been tested against them. If the
>Bracket Doctor/Ergo system really turns out to be better than the
>other ones, I think many linguists would be interested in knowing
>more about the underlying system.

The Penn Treebank system provides a demonstration that a parser has
accurately and thoroughly completed a constituent analysis of a
particular input string. MUC does not require this, but it is very
important for all other areas of NLP.

>Pius ten Hacken and Peter Menzel argue that theoretical linguistics
>is right to be less concerned with data coverage than computational
>linguistics, since it is more concerned with explanation in terms of
>mental capacities. It seems to me that the kind of work they are
>talking about can better be considered a kind of psycholinguistics,
>which by definition is concerned with the relationship between
>language and the brain. This is certainly a well established branch
>of theoretical linguistics, but it is not the same thing as syntax,
>which is primarily concerned with sentences (as many different kinds
>as possible) and the relationships between them. On the other hand,
>ten Hacken and Menzel might prefer to join forces with functionalist
>approaches to syntax, which aim to explain properties of language in
>terms of historical development, ambiguity avoidance, processing,
>etc.

But whatever sort of analysis one chooses to do, there should be some
expectation that the basics have been accounted for. That is the
periodic table and basic properties should be isolated before making
claims about explanatory power and so on.

>I think these are all valid approaches to language, but none of them
>is the same thing as formal approaches, which to my mind have a
>relationship to the rest of linguistics similar to that of
>mathematics to the natural sciences --that of a useful servant, whose
>task is to provide a precise model of the object of inquiry. Given
>the accuracy of this assessment, formal linguistics has the potential
>to be important not only for computational implementation, but also
>for many other areas, although computational implementation is still
>the best way to find out if the analysis really works, at least if
>the implementation is an accurate reflection of the grammar.

I would agree with this. 

>I don't follow Menzel's reasoning when he apparently argues that
>because language functions are spread across various parts of the
>brain, it is doubtful whether it is algorithmic. I don't doubt the
>premise of this argument. Clearly, some, but perhaps not all,
>aspects of language knowledge, production, and understanding are
>linked to other actvities of the brain. But what has that got to do
>with the question of whether it's algorithmic? I even looked up the
>word "algorithm" to find out whether I had misunderstood the meaning
>of this word. According to my dictionary, an algorithm is just a
>procedure for doing something. I think it is clear that every
>healthy human being has an algorithm for his/her own language in this
>sense, even though this algorithm sometimes does not work as well as
>we want it to -- we sometimes have trouble finding the words we need
>to express our ideas. But some sort of algorithm is there
>nevertheless..

I would agree here too. 


Phil Bralich

Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)539-3924
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue