LINGUIST List 9.368

Thu Mar 12 1998

Disc: NLP and Syntax

Editor for this issue: Martin Jacobsen <>


  1. Pius ten Hacken, Re: 9.342, Disc: NLP and Syntax
  2. Samuel L. Bayer, Re: 9.342, Disc: NLP and Syntax
  3. Peter Menzel, NLP and syntax

Message 1: Re: 9.342, Disc: NLP and Syntax

Date: Mon, 09 Mar 1998 14:21:05 +0100
From: Pius ten Hacken <>
Subject: Re: 9.342, Disc: NLP and Syntax

Contrary to what Phil Bralich suggested at various places in this
discussion, silence on certain points he made earlier does not
automatically mean agreement with him or surrender. A bare repetition
of previous arguments (such as "please look closely at those standards
(repeated below)"), however, contributes nothing to the deepening of
the discussion and will annoy the readers. It only shows that people
mutually reject the relevance of each other's arguments.

Having said this, I would like to make one remark on Phil Bralich's
reply in issue 9.342, because I have the impression I can expand and
clarify the point I want to make. Text with ">>" is from my
contribution in issue 9.328.

>>A syntactic theory is part of linguistics as an empirical science.
>>Empirical sciences are concerned with explaining chosen aspects of a
>>domain of observations in the real world. In the case of
>>linguistics, the domain is natural language, but different aspects
>>of language can be chosen as a goal for explanation,
>>e.g. acquisition in Chomskyan linguistics, processing in LFG. The
>>success of a linguistic theory depends on the degree to which an
>>explanatory account is reached. The fact that different linguistic
>>theories take different questions about language as a basis for
>>research implies that in some cases a common ground for evaluation
>>is missing. <...> > >This is quite true, but how can you actually
>>study acquisition or >processing if you have not yet properly
>>isolated or described what is >that is being acquired or processed?
>>You cannot talk about the >acquisition of langauge or the processing
>>of langauge if you are still >not able to properly describe parts of
>>speech, parts of the sentence, >statements, questions, subjects, and
>>so on and the relationships >between them. You simply have not
>>completed the preliminaries. I am >not saying that syntax is all of
>>linguistics any more than I am saying >that the isolating the
>>periodic table is the whole of chemistry, but >without the work
>>being substantially completed on those basics neither >field is
>>really ready to begin. Any theory that attempts to study >these
>>"domains of observation in the real world" must first >demonstrate
>>that it has at least isolated and described these domains >in some
>>tangible sense. My main argument is that the entire field is
>>>starting off at the wrong end of the stick

I thank Phil Bralich for his acknowledgement that the entire field is
on my side, although I suspect this is a slight exaggeration. Without
exaggeration, however, I can claim that in philosophy of science it is
generally acknowledged that theory-neutral observation and
theory-neutral description are impossible. There are just too many
different things to observe and too many possible ways of describing
them. In order to make a sensible selection, a theory is
necessary. (If you think you do it without a theory, your theory is
just entirely implicit, which is not a recommendation) As a
consequence, the question which aspect of language is to be explained
cannot wait until after the description, because it interacts with the
description. This interaction influences, for instance, the choice of
most urgent data to cover, the criteria for a valid coverage, etc. In
a theory of linguistics the most urgent data to be described need not
be the most frequent ones (i.e. the ones most urgent for a parser),
but they should be the ones that provide arguments for decisions on
how to extend or modify the theory.

Summarizing: Theoretical linguistics and CL have different goals. The
descriptive goal of CL is not a subset of the goal of theoretical
linguistics, although there is a significant overlap. Explanation does
not follow description in theoretical linguistics, but the two develop
in interaction. Therefore,

1. a linguistic theory which does not give a description of all the
phenomena Bralich's parser covers need not be a bad theory;

2. if Bralich's underlying theory is only descriptive it is not as
good _qua linguistic theory_ as the existing ones. (N.B. I do not say
anything about the evaluation of Bralich's parser as a parser)

Pius ten Hacken


Dr. Pius ten Hacken
Institut fuer Informatik/Allgemeine Sprachwissenschaft
Universitaet Basel
Petersgraben 51 || Tel. +41-61-267'33'38
CH-4051 Basel || Fax +41-61-267'32'51
Switzerland || email:

web page:

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 9.342, Disc: NLP and Syntax

Date: Mon, 9 Mar 1998 23:14:13 -0500 (EST)
From: Samuel L. Bayer <>
Subject: Re: 9.342, Disc: NLP and Syntax

 I wrote:

>Relative to the Derek Bickerton/Philip Bralich parser adequacy
>criteria: the field of computational linguistics has spent quite a
>number of years developing evaluation criteria for parsers, which I
>recommend looking at before you start reinventing the wheel. See the
>journal Computational Linguistics for the last five or six years or
>so, or for a summary, you can read the chapter that my coworkers and
>I wrote on comparing the theoretical and corpus-based computational
>enterprises, in a book edited by John Lawler called Computers and
>Linguistics, due out in April.

 And Philip Bralich responded:

>Yet his organization as well as all others have yet to be able to
>create a BracketDoctor--a device that generates trees and labeled
>brackets in the accepted style for this industry, the style of the
>Penn Treebank II guidelines, or a MemoMaster, a device that increases
>by many thousands the number of commands that are possible for a
>speech rec navigation and control system. In addition, the standards
>that I propose are largely based on functionality that most people
>have assumed that parsers and theories of syntax had handled years
>ago. The standards I have proposed are meant to demonstrate that
>there is a serious problem in the field; that is, very basic levels of
>functionality (see below) that many believed had already been
>achieved, simply have not been reached.

First, I'm rather disappointed in Dr. Bralich's rather blunt
advertising in a supposedly academic debate, as evidenced in his
rather obvious plug for his software in thie first sentence of this
paragraph. Second, "my organization", the MITRE Corporation, has been
an active participant in helping define standards for a sequence of
DARPA-sponsored evaluations, MUC (the Message Understanding
Conferences, to which Dr. Bralich alludes), which is now ongoing for
almost ten years and has been regarded by virtually everyone in the
field as a singular success. In particular, the government funders and
organizers of this evaluation, who have real-world needs for digesting
and understanding large amounts of text (such as Wall Street Journal
and the AP newswire), have judged the program to be extremely
successful. The MITRE language research group has participated in
these evaluations, as have groups from BBN, SRI, and a number of other
major university and industry research labs (consult any of the 7 MUC
proceedings for details). How Dr. Bralich can dismiss this
community-wide effort so casually is absolutely beyond me.

In fact, the discovery that Dr. Bralich seems to have so recently made
- namely, that many linguistic theories are emperors with no clothes -
was a discovery which almost everyone in the MUC program made years
ago. Dr. Bralich is absolutely right in observing that "very basic
levels of functionaliy ... that many believed had already been
achieved, simply have not been reached", and I believe personally that
the field of theoretical linguistics is becoming increasingly
irrelevant because of this, since we now have the computational
resources to test the accuracy, speed and coverage of these
systems. And I sympathize tremendously with Dr. Bralich's
crusade. However, I can't begin to express how damaging it is to
dismiss a decade of well-thought-out attention to this problem in a
closely related field.

In particular, Dr. Bralich claims that "the standards that I propose
are largely based on functionality that most people have assumed that
parsers and theories of syntax had handled years ago." I don't
disagree with this assertion; but the implication is that the
assumptions about what parsers and theories of syntax are doing is
essentially right. The value of the approach followed by computational
linguistics is that it is task-based, where the task is not parsing,
but rather doing something useful with the text: either doing topic
identification, person/location/organization identification,
identification of events for entry into a database, etc. What the
field discovered were two very important things: (1) that many
problems which syntactic theories took as central were by and large
unimportant because of their infrequent occurrence (like quantifier
scope disambiguation), and (2) many problems which syntactic theories
ignored were crucial and interesting (like part-of-speech
identification and segmentation of text into sentences).

My concern is that Dr. Bralich's set of criteria, while apparently
fairly basic, is distilled from a literature tradition which takes the
sort of syntactic analysis of sentences which generative linguists do
as a useful and important goal as given, which is not necessarily the
case; for many tasks, many of these criteria can be ignored, and the
important thing for making this technology useful is not whether it
conforms to what we think a syntactic theory should do, but whether
what a parser does contributes constructively to the accomplishment of
some task-oriented analysis of linguistic data.

By the standards of evaluation which have evolved in computational
linguistics, Dr. Bralich's standards are unacceptably vague. When he
says that a system should identify parts of speech, which parts of
speech does he mean? The Brown Corpus has about 40 of them; linguists
tend to assume about 7, with some feature breakdown, and no two
frameworks use the same set. What about systems which aren't perfect
(since none of them are)? Is precision (percentage of answers found
which are correct) more important than recall (percentage of correct
answers which are found)? I could observe the same problems with his
definition of terms in many other places in his standards. The point
of the MUC evaluation is that the criteria which have been laid down
are precise enough to measure progress on the given task, and detailed
and explicit enough to forestall any argument about whether scores are
comparable. Dr. Bralich's criteria fail on both these counts.

>Standards proposed by MUC, 
>Computational Linguistics and so on do not address the fact that they
>are not requiring their members to meet those very basic standards.
>Rather there is an end-run around that expectation that takes one to a
>world of non-existent parsers that are meeting standards that are "out
>there somewhere" among five or six years of publications (no dates or
>page numbers), and an unpublished book. How can standards of any sort
>be of value if the ones I have proposed (reprinted below) have not yet
>been met. These basic levels of functionality should be met long fore
>anyone attempts comparisons or evaluations of different systems. If
>you just look at them you will see they make a good qualifying round
>from which to begin the discussion of "mature parsers" and 
>"mature theories of syntax."

First, the idea that MUC ought to be handing out a "Good Housekeeping
Seal of Parsing" is absurd on its face. The scores for the systems
which are participating in MUC are out there for the world to see;
every one of the proceedings has been published, and references to
them can be found in any article in CL in the recent past. The reason
I don't list page numbers and articles is because so much of the
journal is devoted to this problem, as anyone who had spent any time
reading it will attest (not to mention the annual ACL proceedings and,
of course, the MUC proceedings). The idea that I've alluded to some
"cheat" involving nonexistent articles and language analysis systems
is simply insulting. All the MUC proceedings list the participants and
their scores, as well as describing the tasks in detail. Anyone who
suggests or implies otherwise simply hasn't read the literature.

[And the only reason you can't download the article we wrote from the
Web is because the publisher insists on first publishing rights. And
there's nothing in it which is new; it is simply a description and
distillation of the work that's already been done in the field.]

Finally, the claim that "these basic levels of functionality should be
met long before anyone attempts comparisons or evaluations of
different systems" is simply dogma, and ignores the progress in the
field which I outlined above. The reason we don't evaluate or compare
systems based on these criteria is because the criteria are
spectacularly ill-defined, and in many cases irrelevant.

>Before reading the standards I propose below, please note that there
>is no list of standards given in Mr. Bayer's letter, just a reference
>to standards that have supposedly been dispersed "somewhere"
>throughout five or six years of this publication or in a yet to be
>published book. There is also no reference to working parsers that
>could meet any standards therein proposed and certainly no discussion
>of parsers that could actually meet the standards I have proposed.

As I've said, the literature speaks for itself. If I were to refer to
Chomsky's Minimalist Program, would Dr. Bralich challenge its
existence simply because I failed to provide a bibliography? I hope
not. And it's Dr. Bayer, by the way, and, yes, the doctorate is in
theoretical linguistics.

>Instead of paging through those many periodicals, ask yourself one
>question. "Why isn't there a single reference to a list of standards
>in this field or from this journal such as proposed below?"

What if I said, "Instead of paging through those many periodicals
about GB, why not just make up your own theory? Why isn't there a
single list of rules in GB that I can refer to, like Cliff Notes?" The
answer is obvious: because the field is large and complicated, and the
problems are difficult, and you should read the literature, because
it's important and relevant and that's what scholarship is about. All
this is also true of evaluation of language systems. To suggest
otherwise is insulting to all the theorists who have slaved over these
difficult problems.

>The answer I think is that there is a painful awareness on the part of
>the computational linguistics community of their lack of success after
>35 years in which millions of dollars and hundreds maybe thousands of
>man years have been invested. Cerainly, there is not the pride that I
>feel in asserting in black and white in a list appended to this
>message exactly what is possible in this field, and the value it will
>have not only for my company but for creating jobs and projects for
>students and linguists in this area for many years to come--not only
>in English, but in all the langauges of the world. It is not wise for
>the field overall to shun the one parser that shows any promise at all
>of making good on 35 years of empty promises. All that does is
>guarantee that in the long run the jobs, the projects, the profits,
>and so on will all belong to Ergo. I would like to see this company
>contiue to profit in this field certainly, but I will feel guilty if
>the entire field of computational linguistics tosses the whole of the
>jobs and projects into our laps simply because they were unwilling to
>admit they cannot meet the standards we propose.

Let's not argue about who's failed; in general, computational
linguists have made far more progress in evaluation (and in producing
usable systems) than most people working in the field. I have a
counterproposal for Dr. Bralich: run the Ergo system against the
MUC-6 or MUC-7 evaluation set, and tell us what your score is relative
to the other systems which participated in the evaluation. 

I won't bother responding to the rest of Dr. Bralich's message; it
continues to imply that I am alluding to evaluation criteria and
software systems which don't exist, based on Dr. Bralich's apparent
unwillingness to read a body of commonly-available literature. It is
not my duty to educate Dr. Bralich about what's been going on in
computational linguistics; the literature is there and anyone can look
at it, and I, for one, won't demean the complexity of the field by
suggesting that it ought to be summarizable in a one-page note.

My larger concern is that Dr. Bralich's criteria will be taken as
characteristic of the way evaluation of language systems ought to be
done, when in truth they're badly and imprecisely defined, and focused
on details which have yet to have a demonstrable impact on the sort of
tasks which language systems are currently capable of doing. We all
know that there's a lot about language we don't understand, and as a
result computational linguists have discovered that the dream of
robust, full-coverage, detailed analysis of language is far, far
away. It turns out that many of the detailed phenomena of the sort
that theoretical linguistics is fond of examining either turn up very
seldom in corpora, or can be ignored for the purposes of a wide range
of tasks, or can't be evaluated yet due to disagreements about terms,
etc., etc. There's no doubt that in the long run, language systems
will have to many of the things which Dr. Bralich alludes to; but it
has turned out to be massively unproductive to focus on such a
theoretically-inspired list in the near term. This result has grown
from many, many years of consideration of the problem by a wide range
of well-trained and intelligent researchers whose contribution
Dr. Bralich seems to regard as worthless. No greater disservice could
be done to the notion of evaluable language systems than for
theoretical linguistics to take Dr. Bralich's criteria as its primary

Samuel Bayer
The MITRE Corporation
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: NLP and syntax

Date: Wed, 11 Mar 1998 22:00:20 +0100
From: Peter Menzel <>
Subject: NLP and syntax

The discussion on this topic is a familiar one in liguistics and in
science in general. As my contribution to it, I should like to
clarify some of the underlying assumptions made by Bralich and other
computational linguists on the one hand, and by theoretical linguists
like ten Hacken and Arnold on the other.

First, and most basic, is the question of what's been called "the
purpose of linguistics" (or any other discipline, for that matter).
The traditional and obvious formulation of the goal as "describing
language" has at least two serious shortcomings (which are really two
aspects of one and the same area): That of universality and that of
prediction. In other words, how can we make certain that the sample
of a language we happen to be describing is representative of human
language, and that in two senses: How can we generalize from our
sample to the rest of the language in question, and how can we
generalize to other languages? Chomsky's reformulation of our goal as
"accounting for the native speaker's (linguistic) intuitions"
successfully enlarged the discipline by including the area I
mentioned. In doing this, it also moved lingistics squarely into

I believe that here, already, we've come to a division of interests
between computer linguists and what one might want to call "speaker
linguists"; for the former are not, as a rule, interested in all
aspects of "native speaker intuition", though they are, of course,
interested in generalizing their sample to the rest of the language in
question; and, to a lesser extent, in generalizing to human laguage.
With respect to the latter, they mostly seem to assume that the
language they're describing is representative of human language. But,
as Chomsky pointed out, and as we all learned in grad school,
"accounting for native speaker intuition" also has serious
implications for the acceptability of a proposed analysis.

Although we don't all agree just what we mean by this phrase and how
seriously we have to apply it, in the last forty years or so linguists
have appealed to it often, and analyses were supported by the claim
that they were "intuitively obvious", or disparaged by being called
"counterintuitive". Computational linguists do not appear to be
concerned with accounting for native speaker intuition, except in the
sense that they assume that computer and software model brain and
thought (mind) respectively, and that, moreover, they do so in some
simple one-to-one way. The earliest argument for a particular model
of grammar based on this analogy that I know of is Sidney Lamb's claim
for what he then (the mid sixties) called "stratificational grammar"
It was accepted by other computational linguists working within his
paradigm; e.g., the group working in Santa Monica at that time.

Now, if there is one thing that recent work in neurophysiology,
psychology, psychobiology, and related areas has shown, it is that the
human brain (or that of any other living being we have examined, for
that matter) does NOT work like a general purpose serial computer. As
far as I can tell, there are three possible reasons why anyone might
today still assume that syntactic analysis is (or should be) based on
algorithms or other mathematical models. In the simplest case, the
researcher is not aware of work in the above mentioned areas, and
should thus be directed to the appropriate readings. In the more
complex case, what linguists call "psychological reality" (another way
of saying that we must account for native speaker intuition) does not
interest her or him, as it does not seem to interest most
computational linguists. In this case, they and we work in different,
non-commensurable paradigms, and there is little use in arguing with
each other concerning the adequacy of the other's analysis. Both
groups would do well, however, to be aware of this, and thus avoid
needless mud slinging.

The most complex case is the one where a researcher believes that the
human linguistic ability is different from all our other abilities and
can, indeed, be accounted for by using algorithmic models, while our
other abilities cannot. While in this area all the evidence is not
yet in, it is becoming clear where we should look for it:
Paleoanthropology can help shed some light on the question (cf., e.g.,
Steven Mithen's recent book "the Prehistory of Mind"), but most
important should be work in neurophysiology and psychobiology. If our
language processing activities could be shown to occur only in certain
areas of the brain and no other processing occured there, then we
should have a strong case for claiming that our language ability is
different from our other abilites. I need hardly point out that not
only do we not have such evidence but, more important, the evidence we
do have indicates that language processing is "mixed" with all sorts
of other processing, both in the sense that it occurs not only in
specific areas (Broca's and Wernicke's areas as tradition has it), and
in the sense that other types of processing occur there as well.

The evidence on the various types of aphasia is by far not as clear
cut as our textbooks make them out to be! A recent study showed, for
example, that for every case corresponding more or less to the
traditional categories of trauma and (type of) aphasia there was one
that did not: either the lesion(s) in Broca's or Wernicke's area(s)
did not produce aphasia, or lesions in other areas did. Other,
non-invasive procedures on healthy brains (PET, tomography, etc) also
show that language input calls forth distributed processing ativities,
not exclusive to, or even mainly in, the so-called language areas.
There was one study (sorry, can't remember the referene) that showed
right (i.e., non-dominant) hemisphere activities when subjects were
dealing with ambiguous input! The most interesting, not to say
amazing, work I know of in this respect is that of Robert Lefroy, a
Western Australian teacher, who worked with youngsters having
difficulties acquiring literacy skills (i.e., reading and writing).
He reports considerable improvement by training various of their motor
skills, like throwing a ball, skipping rope, jumping on a trampoline,
etc. Of course, they also worked on their literacy skills, but this
they had done before, in other schools, without success.

In other words, the strong versions of modularity of the mind (a la
Fodor) are unlikely to be correct, since there is no evidence for such
strong modularity in the brain. On the contrary, while the brain is
undoubtedly modular, particularly in the cortical areas these modules
are largely multi-functional and communicate with extremely large
numbers of other modules, and do so not only within the cortex but
also with modules in the midbrain and in the brainstem. This also
makes a truly separate language facility and language processing
unlikely. Therefore, models of language claiming to be based on
psychological reality, native speaker intuition, or some such, should,
at least for the present (i.e., until neurophysiological evidence of
the type I indicated above is found), not be algorithmic-mathematical.

Linguists in search of non-algorithmic models I refer to work in what
might loosely be called "constructivist linguistics"; computational
linguists in search of such models I refer to work on neural networks;
the philosophically inclined, lastly, I refer to the work of the
Churchlands. (All of this has been going on since at least 1980.)

Sorry, but this turned out to be longer than I anticipated.

Peter Menzel
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue