Discussion Details
| Title: | Re: A Challenge to the Minimalist Community |
| Submitter: | Charles Yang |
| Description: | I would like to add two points to the current discussion.
First, the challenge probably has been met - and many years ago. Broad coverage parsers based on Government Binding / Minimalism DO exist. The earliest commercial application I am aware of was Bob Kuhns' GB parser that was used to summarize newswire stories in the 1980s, published at the COLING conference in 1990. A more glaring omission is Dekang Lin's Principles & Parameters based parsers - unambiguously dubbed PRINCIPAR and MINIPAR respectively - which have been used in a variety of applications, and have figured prominently in computational linguistics. For instance, for the task of pronoun antecedent resolution, Lin's P&P-based system compared favorably against the much larger and expensive programs at DARPA's 6th Message Understanding Conference (MUC) in 1995. One of the reasons for its success was the implementation of - God forbid - the binding theory, in addition to other discourse constraints on pronoun resolution. MINIPAR is a parsing system based on the Minimalist formalism, and has been around for at least 8 years: I evaluated - and recommended - the parser for a major computer company in the summer of 1997. According to Lin's website, http://www.cs.ualberta.ca/~lindek/minipar.htm, ''MINIPAR is a broad- coverage parser for the English language. An evaluation with the SUSANNE corpus shows that MINIPAR achieves about 88% precision and 80% recall with respect to dependency relationships. MINIPAR is very efficient, on a Pentium II 300 with 128MB memory, it parses about 300 words per second.'' You can even download a copy. I suspect that no reward is necessary: Dekang Lin is currently at Google, Inc. My second point has to with the success of statistical parsing. In my experience, most linguists don't give a damn about parsing, or computers, for that matter: they are not paid to develop technologies that may one day interest Microsoft. Yet I invite those who are in the business of (statistical) parsing to reflect on their success. On my view, the improvement in parsing quality over the past decade or so has less to do with breakthroughs in machine learning, but rather with the enrichment in the representation of syntactic structures over which statistical induction can take place. The early 1990s parsers using relatively unconstrained stochastic grammars were disastrous (Charniak 1993). By the mid 90s, notions like head and lexical selection, both of which are tried and true ideas in linguistics, had been incorporated in statistical parsers (de Marcken 1996, Collins 1997). The recent, and remarkable, work of Klein and Manning (2002) takes this a step further. So far as I can tell, in the induction of a grammatical constituent, Klein & Manning's model not only keeps track of the constituent itself, but also its aunts and sibling(s) in the tree structure. These additional structures is what they refer to as ''context''; those with a more traditional linguistics training may recall ''specifier'', ''complement'', ''c-command'', and ''government''. If this interpretation is correct, then the rapid progress in statistical parsing offers converging evidence that the principles and constraints linguists have discovered are right on the mark, and if one wishes, can be put into use for practical purposes. (And perhaps linguists deserve a share of the far larger pot of research funds available to natural language engineers.) This, then, would seem to be a time to rejoice and play together, rather than driving a wedge of ''challenge'' between the two communities. Charles Yang Yale University References Charniak, E. 1993. Statistical natural language processing. Cambridge, MA: MIT Press. Collins, M. 1997. Three generative, lexicalized models for statistical parsing. ACL97, Madrid. de Marcken, C. 1995. On the unsupervised induction of phrase structure grammars. Proceedings of the 3rd workshop on very large corpora. Cambridge, MA. Klein, D & Manning, C. 2002. Natural language grammar induction using a constituent-context model. NIPS 2001. |
| Date Posted: | 11-May-2005 |
| Linguistic Field(s): |
Computational Linguistics
Linguistic Theories Discipline of Linguistics |
| LL Issue: | 16.1505 |
| Posted: | 11-May-2005 |

