LINGUIST List 15.2420

Tue Aug 31 2004

Diss: Comp Ling: Hartrumpf: 'Hybrid...'

Editor for this issue: Takako Matsui <takolinguistlist.org>


Directory

  1. Sven.Hartrumpf, Hybrid Disambiguation in Natural Language Analysis

Message 1: Hybrid Disambiguation in Natural Language Analysis

Date: Tue, 31 Aug 2004 08:33:37 -0400 (EDT)
From: Sven.Hartrumpf <Sven.Hartrumpffernuni-hagen.de>
Subject: Hybrid Disambiguation in Natural Language Analysis

Institution: University of Hagen
Program: Computer Science
Dissertation Status: Completed
Degree Date: 2002

Author: Sven Hartrumpf

Dissertation Title: Hybrid Disambiguation in Natural Language Analysis

Dissertation URL: http://pi7.fernuni-hagen.de/hartrumpf/publications

Linguistic Field: Computational Linguistics, Text/Corpus Linguistics 

Subject Language: English (code: ENG), German Standard (code: GER)

Dissertation Director 1: Hermann Helbig
Dissertation Director 2: Istv�n B�tori

Dissertation Abstract:

This PhD thesis proposes, formalizes, and evaluates a new hybrid
disambiguation method for ambiguity problems in natural language
processing (NLP): rule-centered multidimensional back-off. The first
eight chapters are summarized in the following eight paragraphs in
turn.

As an introduction, the context and the embedding of the work are
described. Three main theses are formulated and the work is motivated.

A general classification of ambiguities is given and the concept of
hybridization is introduced. The most important parts of the embedding
of the thesis are described: the employed semantic representation
formalism (MultiNet), the WOCADI parser applied for the disambiguation
modules, the knowledge sources required by the parser (corpora and
lexica), and the tools for lexical knowledge. Basic concepts of
evaluation techniques are explained because the disambiguation method
is thoroughly evaluated for three paradigmatic ambiguity problems in
NLP using annotated corpora of German newspaper articles.

The WOCADI parser is described in more detail because it is required -
to different extents (phrase or sentence parsing) - for the
disambiguation modules. Its architecture, some principles, and example
results are presented.

Before approaches for specific disambiguation problems are developed,
the general approach is introduced. The solution is hybrid and
combines interpretation rules expressing valuable linguistic knowledge
and statistics on top of the applicability of these rules for
annotated corpora. For the statistical part, a new statistical model
is defined: multidimensional back-off models that extend the concept
of traditional back-off models considerably.

The three ambiguity problems, whose treatment together with the
general disambiguation approach is the scientific center of the
thesis, are prominent examples that span a whole range of
disambiguation problems. First, the problem of prepositional phrase
attachment and interpretation (a structural ambiguity) is tackled with
an instantiation of the general disambiguation approach and the
addition of bonus factors from syntax and semantics. The resulting
PAIRUDIS system is evaluated on a corpus.

Second, the problem of coreference resolution (a referential
ambiguity) is described and a hybrid solution with rules and
disambiguation statistics, the CORUDIS system, is proposed and
evaluated. The evaluation includes additional bonus factors like
syntactic or semantic parallelism of constituents.

Third, the problem of word sense disambiguation (WSD; a lexical
ambiguity) is treated for a subproblem, namely WSD for words in
prepositional phrase contexts. The resulting system is evaluated and
the improvements achieved for the PAIRUDIS system are determined.

Finally, further ambiguity problems in NLP are briefly introduced. For
the majority of them, a hybrid solution with a rule-centered
multidimensional back-off model is sketched.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue