Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34513

Still Needed:

$40487

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Discussion Details




Title: Re: A Challenge to the Minimalist Community
Submitter: Charles Yang
Description: I would like to add two points to the current discussion.

First, the challenge probably has been met - and many years ago.
Broad coverage parsers based on Government Binding / Minimalism
DO exist. The earliest commercial application I am aware of was Bob
Kuhns' GB parser that was used to summarize newswire stories in the
1980s, published at the COLING conference in 1990. A more glaring
omission is Dekang Lin's Principles & Parameters based parsers -
unambiguously dubbed PRINCIPAR and MINIPAR respectively - which
have been used in a variety of applications, and have figured
prominently in computational linguistics. For instance, for the task of
pronoun antecedent resolution, Lin's P&P-based system compared
favorably against the much larger and expensive programs at
DARPA's 6th Message Understanding Conference (MUC) in 1995.
One of the reasons for its success was the implementation of - God
forbid - the binding theory, in addition to other discourse constraints
on pronoun resolution.

MINIPAR is a parsing system based on the Minimalist formalism, and
has been around for at least 8 years: I evaluated - and recommended -
the parser for a major computer company in the summer of 1997.
According to Lin's website,
http://www.cs.ualberta.ca/~lindek/minipar.htm, ''MINIPAR is a broad-
coverage parser for the English language. An evaluation with the
SUSANNE corpus shows that MINIPAR achieves about 88% precision
and 80% recall with respect to dependency relationships. MINIPAR is
very efficient, on a Pentium II 300 with 128MB memory, it parses about
300 words per second.'' You can even download a copy. I suspect
that no reward is necessary: Dekang Lin is currently at Google, Inc.

My second point has to with the success of statistical parsing. In my
experience, most linguists don't give a damn about parsing, or
computers, for that matter: they are not paid to develop technologies
that may one day interest Microsoft. Yet I invite those who are in the
business of (statistical) parsing to reflect on their success. On my
view, the improvement in parsing quality over the past decade or so
has less to do with breakthroughs in machine learning, but rather with
the enrichment in the representation of syntactic structures over which
statistical induction can take place. The early 1990s parsers using
relatively unconstrained stochastic grammars were disastrous
(Charniak 1993). By the mid 90s, notions like head and lexical
selection, both of which are tried and true ideas in linguistics, had
been incorporated in statistical parsers (de Marcken 1996, Collins
1997). The recent, and remarkable, work of Klein and Manning (2002)
takes this a step further. So far as I can tell, in the induction of a
grammatical constituent, Klein & Manning's model not only keeps track
of the constituent itself, but also its aunts and sibling(s) in the tree
structure. These additional structures is what they refer to
as ''context''; those with a more traditional linguistics training may
recall ''specifier'', ''complement'', ''c-command'', and ''government''.

If this interpretation is correct, then the rapid progress in statistical
parsing offers converging evidence that the principles and constraints
linguists have discovered are right on the mark, and if one wishes, can
be put into use for practical purposes. (And perhaps linguists deserve
a share of the far larger pot of research funds available to natural
language engineers.) This, then, would seem to be a time to rejoice
and play together, rather than driving a wedge of ''challenge'' between
the two communities.

Charles Yang
Yale University

References

Charniak, E. 1993. Statistical natural language processing.
Cambridge, MA: MIT Press.

Collins, M. 1997. Three generative, lexicalized models for statistical
parsing. ACL97, Madrid.

de Marcken, C. 1995. On the unsupervised induction of phrase
structure grammars. Proceedings of the 3rd workshop on very large
corpora. Cambridge, MA.

Klein, D & Manning, C. 2002. Natural language grammar induction
using a constituent-context model. NIPS 2001.
Date Posted: 11-May-2005
Linguistic Field(s): Computational Linguistics
Linguistic Theories
Discipline of Linguistics
LL Issue: 16.1505
Posted: 11-May-2005

Search Again

Back to Discussions Index