LINGUIST List 2.46

Saturday, 23 Feb 1991

FYI: Reversible grammars, Consortium for Lexical Research

Editor for this issue: <>


Directory

  1. Dale Gerdemann, Dissertation on reversible grammars
  2. Dale Gerdemann, Consortium for Lexical Research

Message 1: Dissertation on reversible grammars

Date: Thu, 14 Feb 91 12:13:06 CST
From: Dale Gerdemann <dalekahane.cogsci.uiuc.edu>
Subject: Dissertation on reversible grammars
My dissertation "Parsing and generation of unification grammars" is
available as Beckman Institute: Cognitive Science Technical Report
Series CS-91-06. To obtain a copy, contact Linda May
at maykant.cogsci.uiuc.edu or by regular mail write to Linda May at
Beckman Institute, 405 N. Mathews, Urbana, Il 61801. 



 Parsing and generation of unification grammars
 by Dale Gerdemann
 Ph.D. Dissertation, University of Illinois at Urbana-Champaign
 Erhard Hinrichs, Advisor

In this dissertation, it is shown that declarative, feature-based,
unification grammars can be used for efficiently both parsing and
generation. It is also shown that radically different algorithms are
not needed for these two modes of processing. Given this similarity between
parsing and generation, it will be easier to maintain consistency
between input and output in interactive natural language interfaces. A
Prolog implementation of the unification-based parser and DAG unifier
is provided. The DAG unifier includes extension to handle disjunction
and negation.

The parser presented in this thesis is based on Stuart Shieber's
extensions of Earley's algorithm. This algorithm is further extended
in order to incorporate traces and compound lexical items. Also, the
algorithm is optimized by performing the subsumption test on
restricted DAGs rather than on the full DAGs that are kept in the
chart. Since the subsumption test can be very time consuming, this is
a significant optimization, particularly for grammars with a
considerable number of (nearly) left recursive rules. A grammar which
handles quantifier scoping is presented as an example of such a
grammar.

For generation, the algorithm is modified in order to optimize the use
of both top-down and bottom-up information. Sufficient top-down
information is ensured by modifying the restriction procedure so that
semantic information is not lost. Sufficient bottom-up information is
ensured by making the algorithm head-driven. Generation also requires
that the chart be modified so that identical phrases are not generated
at different string positions. It is shown how readjustments to the
chart can be made whenever a duplicate phrase is predicted. The
generator in this thesis does not perform equally well with all types
of grammars. Grammars employing type raising may cause the generator
to go into an unconstrained search. However, given the independently
motivated principles of {\em minimal type assignment\/} and {\em type
raising only as needed}, it is shown how such unconstrained searches
can be avoided.

Finally, suggestions are made as to how unification grammars can be
developed in order to handle difficult problems such as partially free
word order, bound variables for semantic interpretation and resolving
feature clashes in agreement.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Consortium for Lexical Research

Date: Mon, 18 Feb 91 11:48:17 MST
From: Dale Gerdemann <dalekahane.cogsci.uiuc.edu>
Subject: Consortium for Lexical Research

 Sylvia Candelaria de Ram
 Natural Language Research
 Computing Research Lab
 New Mexico State University
 Las Cruces, NM 88003
 sylvianmsu.edu OR sylvianmsu.bitnet
 (505) 522-2978 / 646-5466 / fax: 646-6218

The Consortium for Lexical Research

Rio Grande Research Corridor
Computing Research Laboratory
New Mexico State University
Box 30001, Las Cruces, NM 88003.

lexicalnmsu.edu
(505) 646-5466
Fax: (505) 646-6218

Work in computational linguistics has reached the point where the
performance of many natural language processing systems is limited by
a "lexical bottleneck". 
That is, such systems could handle much more text and produce much
more impressive application results were it not for the fact that their
lexicons are too 
small.

The Association for Computational Linguistics has established the
Consortium for Lexical Research (CLR), and DARPA has agreed to fund this.
It will be sited at the Computing Research Laboratory, New Mexico, USA,
under its Director, Yorick Wilks, and an ACL committee
consisting of Roy Byrd, Ralph Grishman, Mark Liberman and
Don Walker.

The Consortium for Lexical Research will be an organization
for sharing lexical data and tools used to perform research on natural
language dictionaries and lexicons, and for communicating the results of
that research. Members of the Consortium will contribute resources
to a repository and withdraw resources from it
in order to perform their research. There is no
requirement that withdrawals be compensated by contributions in kind.

A basic premise of the proposal for cooperation on lexical research
is that the research
must be "precompetitive". That is, the CLR will not
have as its goal the creation of commercial products. 
The goal of precompetitive research would be to augment our
understanding of what lexicons contain and, specifically, to build
computational lexicons having those contents.

The task of the CLR is primarily to facilitate research, making
available to the whole natural language processing community certain
resources now held only by a few groups that have special
relationships with companies or dictionary publishers.
The CLR would as far as is practically possible accept contributions
from any source, 
regardless of theoretical orientation, and make them
available as widely as possible for research.

There is also an underlying theoretical assumption or hope: that the contents of
major lexicons are very similar, and that some neutral, or
"polytheoretic," form of the information they contain can be
at least a research goal, and would be a great boon if it
could be achieved.

A major activity of the CLR will be to negotiate agreements
with "providers" on reassuring and advantageous terms to
both suppliers and researchers. Major funders of work in this area in the US
have indicated interest in making participation in the
CLR a condition for financial support of research.
An annual fee will be charged for membership. 
It is intended that after an initial start-up period,
the Consortium become self-supporting. 

The Computing Research Lab (CRL)
already has an active research program in computational lexicons,
text processing, machine translation, etc., funded by DARPA and
NSF as well as a range of machines appropriate for advanced
computing on dictionaries.

Resources and Services of the Consortium

The following lists of
lexical data and tools seem to provide a reasonable starting content for
the repository. We will continually solicit and encourage additions
to this list.

Data

1. word lists (proper nouns, count/mass nouns, causative verbs, movement verbs,
predicative adjectives, etc.)

2. published dictionaries

3. specialized terminology, technical glossaries, etc.

4. statistical data

5. synonyms, antonyms, hypernyms, pertainyms, etc.

6. phrase lists

Tools

1. lexical data base management tools

2. lexical query languages

3. text analysis tools (concordance, KWIC, statistical analysis,
collocation analysis, etc.)

4. SGML tools (particularly tuned to dictionary encoding)

5. parsers

6. morphological analyzers

7. user interfaces to dictionaries

8. lexical workbenches

9. dictionary definition sense taggers

Services

Repository management will involve cataloging and
storing material in disparate formats, and providing for their
retransmission (with conversion, where appropriate tools exist).
In addition, it will be
necessary to maintain a library of documentation describing the
repository's contents and containing research papers resulting from
projects that 
use the material. A brief description of the services to be provided
is as follows:

CRL will provide a catalog of, and act as a clearinghouse for,
utilities programs that have been written for existing online lexical data.

CRL will compile a list of known mistakes, misprints, etc. that
occur in each of the major published sources (dictionaries etc.). 

CRL will set up a new memorandum series explicitly devoted
to the lexical center. 

CRL will also be a clearinghouse for preprints and hard-to-find
reprints on machine-readable dictionaries.

CRL also expects to conduct workshops in this area, including an
inaugural workshop in late 1991 or early 1992.

CRL would provide a catalog for access to repositories of
corpus-manipulation tools held elsewhere.

We invite you to participate in the Consortium for Lexical
Research.
Anyone interested in participating even in principle
as a provider or consumer of data, tools, or services should
send a message to 

 lexicalnmsu.edu
 or
 lexicalnmsu.bitnet

as should anyone who would like to be on our lexical information list.


Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue