* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: Combining Machine Readable Lexical Resources with a Principle Based Parser
Author: Michael McHale
Email: click here to access email
Degree Awarded: Syracuse University , Information Studies
Degree Date: 1995
Linguistic Subfield(s): Computational Linguistics
Subject Language(s): English
Director(s): Sung Myaeng

Abstract:

This research was motivated by the premise that the ability to process unconstrained, natural language text would ultimately provide information retrieval (IR) with a very useful tool. To date, most syntactic based Natural Language Processing (NLP) systems that support IR have taken one of two approaches: domain independent syntactic processing; or syntactic and semantic processing in limited domains. The purpose of this research was to investigate an approach to domain independent semantic processing – the combination of a principle based parser (PBP) with a semantically enhanced machine-readable dictionary (MRD).

The parser is an implementation of Chomsky's Government-Binding (GB) theory and therefore provides complete syntactic coverage. The coverage of a parsing system is, however, ultimately a function of the size and richness of its lexicon. To provide both size and richness, the lexicon for the system was extracted from Longman's Dictionary of Contemporary English (LDOCE) and semantically enhanced using Roget’s International Thesaurus.

The research investigated: (1) the impact of using an MRD as the lexicon for a PBP; (2) the automatic extraction of thematic roles from the MRD; and (3) methods to enhance those roles using Roget's.

The results show that (1) An MRD can indeed be used with a PBP though the larger, more ambiguous lexicon requires controls in the parser to avoid producing a large forest of candidate parse trees. With such controls, the impact of the larger lexicon becomes no greater for a PBP than for a traditional phrase structure grammar (ex., ATN, APSG) dealing with lexical ambiguity. (2) LDOCE contains patterns in its definitions that can be exploited in the determination of thematic roles; a simple form of semantics. The majority of these roles were extracted using simple lexical patterns. (3) The simple thematic roles can be enhanced using semi-automatic methods. A decomposition of Roget’s hierarchy allowed for a procedural mapping of the simple thematic roles to over 1000 roles with 7 levels of abstraction. It is anticipated, but not shown here, that the enhanced roles will provide an improvement in IR capabilities over the simpler thematic roles.
Add a dissertation
Update dissertation
Page Updated: 29-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.