|Title:||Combining Machine Readable Lexical Resources with a Principle Based Parser||Add Dissertation|
|Author:||Michael McHale||Update Dissertation|
|Email:||click here to access email|
|Institution:||Syracuse University, Information Studies|
|Linguistic Subfield(s):||Computational Linguistics;|
|Abstract:||This research was motivated by the premise that the ability to process unconstrained, natural language text would ultimately provide information retrieval (IR) with a very useful tool. To date, most syntactic based Natural Language Processing (NLP) systems that support IR have taken one of two approaches: domain independent syntactic processing; or syntactic and semantic processing in limited domains. The purpose of this research was to investigate an approach to domain independent semantic processing – the combination of a principle based parser (PBP) with a semantically enhanced machine-readable dictionary (MRD).
The parser is an implementation of Chomsky's Government-Binding (GB) theory and therefore provides complete syntactic coverage. The coverage of a parsing system is, however, ultimately a function of the size and richness of its lexicon. To provide both size and richness, the lexicon for the system was extracted from Longman's Dictionary of Contemporary English (LDOCE) and semantically enhanced using Roget’s International Thesaurus.
The research investigated: (1) the impact of using an MRD as the lexicon for a PBP; (2) the automatic extraction of thematic roles from the MRD; and (3) methods to enhance those roles using Roget's.
The results show that (1) An MRD can indeed be used with a PBP though the larger, more ambiguous lexicon requires controls in the parser to avoid producing a large forest of candidate parse trees. With such controls, the impact of the larger lexicon becomes no greater for a PBP than for a traditional phrase structure grammar (ex., ATN, APSG) dealing with lexical ambiguity. (2) LDOCE contains patterns in its definitions that can be exploited in the determination of thematic roles; a simple form of semantics. The majority of these roles were extracted using simple lexical patterns. (3) The simple thematic roles can be enhanced using semi-automatic methods. A decomposition of Roget’s hierarchy allowed for a procedural mapping of the simple thematic roles to over 1000 roles with 7 levels of abstraction. It is anticipated, but not shown here, that the enhanced roles will provide an improvement in IR capabilities over the simpler thematic roles.