Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

E-mail this page

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Dissertation Information

Title: Combining Machine Readable Lexical Resources with a Principle Based Parser Add Dissertation
Author: Michael McHale Update Dissertation
Email: click here to access email
Institution: Syracuse University, Information Studies
Completed in: 1995
Linguistic Subfield(s): Computational Linguistics;
Subject Language(s): English
Director(s): Sung Myaeng

Abstract: This research was motivated by the premise that the ability to process unconstrained, natural language text would ultimately provide information retrieval (IR) with a very useful tool. To date, most syntactic based Natural Language Processing (NLP) systems that support IR have taken one of two approaches: domain independent syntactic processing; or syntactic and semantic processing in limited domains. The purpose of this research was to investigate an approach to domain independent semantic processing – the combination of a principle based parser (PBP) with a semantically enhanced machine-readable dictionary (MRD).

The parser is an implementation of Chomsky's Government-Binding (GB) theory and therefore provides complete syntactic coverage. The coverage of a parsing system is, however, ultimately a function of the size and richness of its lexicon. To provide both size and richness, the lexicon for the system was extracted from Longman's Dictionary of Contemporary English (LDOCE) and semantically enhanced using Roget’s International Thesaurus.

The research investigated: (1) the impact of using an MRD as the lexicon for a PBP; (2) the automatic extraction of thematic roles from the MRD; and (3) methods to enhance those roles using Roget's.

The results show that (1) An MRD can indeed be used with a PBP though the larger, more ambiguous lexicon requires controls in the parser to avoid producing a large forest of candidate parse trees. With such controls, the impact of the larger lexicon becomes no greater for a PBP than for a traditional phrase structure grammar (ex., ATN, APSG) dealing with lexical ambiguity. (2) LDOCE contains patterns in its definitions that can be exploited in the determination of thematic roles; a simple form of semantics. The majority of these roles were extracted using simple lexical patterns. (3) The simple thematic roles can be enhanced using semi-automatic methods. A decomposition of Roget’s hierarchy allowed for a procedural mapping of the simple thematic roles to over 1000 roles with 7 levels of abstraction. It is anticipated, but not shown here, that the enhanced roles will provide an improvement in IR capabilities over the simpler thematic roles.