Editor for this issue: <>
My dissertation "Parsing and generation of unification grammars" is available as Beckman Institute: Cognitive Science Technical Report Series CS-91-06. To obtain a copy, contact Linda May at mayMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuekant.cogsci.uiuc.edu or by regular mail write to Linda May at Beckman Institute, 405 N. Mathews, Urbana, Il 61801. Parsing and generation of unification grammars by Dale Gerdemann Ph.D. Dissertation, University of Illinois at Urbana-Champaign Erhard Hinrichs, Advisor In this dissertation, it is shown that declarative, feature-based, unification grammars can be used for efficiently both parsing and generation. It is also shown that radically different algorithms are not needed for these two modes of processing. Given this similarity between parsing and generation, it will be easier to maintain consistency between input and output in interactive natural language interfaces. A Prolog implementation of the unification-based parser and DAG unifier is provided. The DAG unifier includes extension to handle disjunction and negation. The parser presented in this thesis is based on Stuart Shieber's extensions of Earley's algorithm. This algorithm is further extended in order to incorporate traces and compound lexical items. Also, the algorithm is optimized by performing the subsumption test on restricted DAGs rather than on the full DAGs that are kept in the chart. Since the subsumption test can be very time consuming, this is a significant optimization, particularly for grammars with a considerable number of (nearly) left recursive rules. A grammar which handles quantifier scoping is presented as an example of such a grammar. For generation, the algorithm is modified in order to optimize the use of both top-down and bottom-up information. Sufficient top-down information is ensured by modifying the restriction procedure so that semantic information is not lost. Sufficient bottom-up information is ensured by making the algorithm head-driven. Generation also requires that the chart be modified so that identical phrases are not generated at different string positions. It is shown how readjustments to the chart can be made whenever a duplicate phrase is predicted. The generator in this thesis does not perform equally well with all types of grammars. Grammars employing type raising may cause the generator to go into an unconstrained search. However, given the independently motivated principles of {\em minimal type assignment\/} and {\em type raising only as needed}, it is shown how such unconstrained searches can be avoided. Finally, suggestions are made as to how unification grammars can be developed in order to handle difficult problems such as partially free word order, bound variables for semantic interpretation and resolving feature clashes in agreement.
Sylvia Candelaria de Ram Natural Language Research Computing Research Lab New Mexico State University Las Cruces, NM 88003 sylviaMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuenmsu.edu OR sylvia
nmsu.bitnet (505) 522-2978 / 646-5466 / fax: 646-6218 The Consortium for Lexical Research Rio Grande Research Corridor Computing Research Laboratory New Mexico State University Box 30001, Las Cruces, NM 88003. lexical
nmsu.edu (505) 646-5466 Fax: (505) 646-6218 Work in computational linguistics has reached the point where the performance of many natural language processing systems is limited by a "lexical bottleneck". That is, such systems could handle much more text and produce much more impressive application results were it not for the fact that their lexicons are too small. The Association for Computational Linguistics has established the Consortium for Lexical Research (CLR), and DARPA has agreed to fund this. It will be sited at the Computing Research Laboratory, New Mexico, USA, under its Director, Yorick Wilks, and an ACL committee consisting of Roy Byrd, Ralph Grishman, Mark Liberman and Don Walker. The Consortium for Lexical Research will be an organization for sharing lexical data and tools used to perform research on natural language dictionaries and lexicons, and for communicating the results of that research. Members of the Consortium will contribute resources to a repository and withdraw resources from it in order to perform their research. There is no requirement that withdrawals be compensated by contributions in kind. A basic premise of the proposal for cooperation on lexical research is that the research must be "precompetitive". That is, the CLR will not have as its goal the creation of commercial products. The goal of precompetitive research would be to augment our understanding of what lexicons contain and, specifically, to build computational lexicons having those contents. The task of the CLR is primarily to facilitate research, making available to the whole natural language processing community certain resources now held only by a few groups that have special relationships with companies or dictionary publishers. The CLR would as far as is practically possible accept contributions from any source, regardless of theoretical orientation, and make them available as widely as possible for research. There is also an underlying theoretical assumption or hope: that the contents of major lexicons are very similar, and that some neutral, or "polytheoretic," form of the information they contain can be at least a research goal, and would be a great boon if it could be achieved. A major activity of the CLR will be to negotiate agreements with "providers" on reassuring and advantageous terms to both suppliers and researchers. Major funders of work in this area in the US have indicated interest in making participation in the CLR a condition for financial support of research. An annual fee will be charged for membership. It is intended that after an initial start-up period, the Consortium become self-supporting. The Computing Research Lab (CRL) already has an active research program in computational lexicons, text processing, machine translation, etc., funded by DARPA and NSF as well as a range of machines appropriate for advanced computing on dictionaries. Resources and Services of the Consortium The following lists of lexical data and tools seem to provide a reasonable starting content for the repository. We will continually solicit and encourage additions to this list. Data 1. word lists (proper nouns, count/mass nouns, causative verbs, movement verbs, predicative adjectives, etc.) 2. published dictionaries 3. specialized terminology, technical glossaries, etc. 4. statistical data 5. synonyms, antonyms, hypernyms, pertainyms, etc. 6. phrase lists Tools 1. lexical data base management tools 2. lexical query languages 3. text analysis tools (concordance, KWIC, statistical analysis, collocation analysis, etc.) 4. SGML tools (particularly tuned to dictionary encoding) 5. parsers 6. morphological analyzers 7. user interfaces to dictionaries 8. lexical workbenches 9. dictionary definition sense taggers Services Repository management will involve cataloging and storing material in disparate formats, and providing for their retransmission (with conversion, where appropriate tools exist). In addition, it will be necessary to maintain a library of documentation describing the repository's contents and containing research papers resulting from projects that use the material. A brief description of the services to be provided is as follows: CRL will provide a catalog of, and act as a clearinghouse for, utilities programs that have been written for existing online lexical data. CRL will compile a list of known mistakes, misprints, etc. that occur in each of the major published sources (dictionaries etc.). CRL will set up a new memorandum series explicitly devoted to the lexical center. CRL will also be a clearinghouse for preprints and hard-to-find reprints on machine-readable dictionaries. CRL also expects to conduct workshops in this area, including an inaugural workshop in late 1991 or early 1992. CRL would provide a catalog for access to repositories of corpus-manipulation tools held elsewhere. We invite you to participate in the Consortium for Lexical Research. Anyone interested in participating even in principle as a provider or consumer of data, tools, or services should send a message to lexical
nmsu.edu or lexical
nmsu.bitnet as should anyone who would like to be on our lexical information list.