LINGUIST List 9.396

Wed Mar 18 1998

Disc: NLP and Syntax

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. Mark Johnson, Re: 9.342, Disc: NLP and Syntax
  2. Dan Maxwell, 9.368, Disc: NLP and Syntax

Message 1: Re: 9.342, Disc: NLP and Syntax

Date: Wed, 18 Mar 1998 13:50:06 -48465656 (EST)
From: Mark Johnson <mj1lx.cog.brown.edu>
Subject: Re: 9.342, Disc: NLP and Syntax


It would seem that Philip Bralich's parser is most closely related to
broad-coverage parsers like FIDDITCH (developed by Don Hindle at the
then Bell Labs), the LINK grammar developed at CMU and work by Steve
Abney (now at AT&T Labs).

As far as I know, the current best broad-coverage parsers use
statistical information, such as the ones described by Collins ``Three
Generative, Lexicalised Models for Statistical Parsing'' in the 1997
ACL conference proceedings and Charniak ``Statistical Parsing with a
Context-Free Grammar and Word Statistics'' in the 1997 AAAI conference
proceedings.

In the admittedly small world of academic research on broad-coverage
parsing, parser performance is usually evaluated by computing the
average of the labelled precision and recall scores comparing the
parser's output with the hand-constructed parse trees of a held-out
section (usually section 23) of the Penn II WSJ corpus. This provides
a standard quantitative score of parser performance. The authors
above claim average precision and recall scores of around 87%. I was
wondering what average labelled precision and recall scores Philip
Bralich's parser achieves?

Mark Johnson
Cognitive and Linguistic Sciences
Brown University
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: 9.368, Disc: NLP and Syntax

Date: Wed, 18 Mar 1998 17:38:07 -0500
From: Dan Maxwell <100101.2276compuserve.com>
Subject: 9.368, Disc: NLP and Syntax

I want to thank Sam Bayer for alerting us to the existence of detailed
evaluation criteria for parsers. If every article in recent issues of
CL not only refer to these criteria, but also tell us where they can
be found found, then I agree with him in urging Phil Bralich to test
his own parser against these criteria before making such strong
claims. Since other parsers have already been tested against the MUC
criteria, this appears to provide a reasonably objective basis for
comparison. For all I know, the Penn Treebank guidelines may be just
as good or better in some sense (it's hard to tell for sure), but I
haven't yet heard that they provide a detailed grading system like the
MUC system, nor that other parsers have been tested against them. If
the Bracket Doctor/Ergo system really turns out to be better than the
other ones, I think many linguists would be interested in knowing more
about the underlying system.

Pius ten Hacken and Peter Menzel argue that theoretical linguistics is
right to be less concerned with data coverage than computational
linguistics, since it is more concerned with explanation in terms of
mental capacities. It seems to me that the kind of work they are
talking about can better be considered a kind of psycholinguistics,
which by definition is concerned with the relationship between
language and the brain. This is certainly a well established branch
of theoretical linguistics, but it is not the same thing as syntax,
which is primarily concerned with sentences (as many different kinds
as possible) and the relationships between them. On the other hand,
ten Hacken and Menzel might prefer to join forces with functionalist
approaches to syntax, which aim to explain properties of language in
terms of historical development, ambiguity avoidance, processing, etc.

I think these are all valid approaches to language, but none of them
is the same thing as formal approaches, which to my mind have a
relationship to the rest of linguistics similar to that of mathematics
to the natural sciences --that of a useful servant, whose task is to
provide a precise model of the object of inquiry. Given the accuracy
of this assessment, formal linguistics has the potential to be
important not only for computational implementation, but also for many
other areas, although computational implementation is still the best
way to find out if the analysis really works, at least if the
implementation is an accurate reflection of the grammar.

I don't follow Menzel's reasoning when he apparently argues that
because language functions are spread across various parts of the
brain, it is doubtful whether it is algorithmic. I don't doubt the
premise of this argument. Clearly, some, but perhaps not all, aspects
of language knowledge, production, and understanding are linked to
other actvities of the brain. But what has that got to do with the
question of whether it's algorithmic? I even looked up the word
"algorithm" to find out whether I had misunderstood the meaning of
this word. According to my dictionary, an algorithm is just a
procedure for doing something. I think it is clear that every healthy
human being has an algorithm for his/her own language in this sense,
even though this algorithm sometimes does not work as well as we want
it to -- we sometimes have trouble finding the words we need to
express our ideas. But some sort of algorithm is there nevertheless..
 
Peter Menzel suggests that neural networks is of interest for
computational linguists trying to develop nonalgorithmic models,
though not for theoretical linguists.. But as noted above, he
believes that theoretical linguistics is concerned with the
relationship of language to the brain. First of all, I don't see why
neural networks shouldn't be part of our language algorithm, but my
confusion on this point may be related to the discussion in the
previous paragraph.

More clearly, doesn't it follow that since neural networks is a topic
which formulates hypotheses about neurons and how they are linked
together in the brain that neural networks is of interest for
theoreticians as well?

Chomsky recently informed me by email that he would like to find a way
to use neural networks, but didn't see a way to do this. Well, I
think others in the field, including myself, have done some fairly
detailed work in this direction.
 
Dan Maxwell 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue