Editor for this issue: Takako Matsui <tako
linguistlist.org>
Institution: University of Hagen Program: Computer Science Dissertation Status: Completed Degree Date: 2002 Author: Sven Hartrumpf Dissertation Title: Hybrid Disambiguation in Natural Language Analysis Dissertation URL: http://pi7.fernuni-hagen.de/hartrumpf/publications Linguistic Field: Computational Linguistics, Text/Corpus Linguistics Subject Language: English (code: ENG), German Standard (code: GER) Dissertation Director 1: Hermann Helbig Dissertation Director 2: Istv�n B�tori Dissertation Abstract: This PhD thesis proposes, formalizes, and evaluates a new hybrid disambiguation method for ambiguity problems in natural language processing (NLP): rule-centered multidimensional back-off. The first eight chapters are summarized in the following eight paragraphs in turn. As an introduction, the context and the embedding of the work are described. Three main theses are formulated and the work is motivated. A general classification of ambiguities is given and the concept of hybridization is introduced. The most important parts of the embedding of the thesis are described: the employed semantic representation formalism (MultiNet), the WOCADI parser applied for the disambiguation modules, the knowledge sources required by the parser (corpora and lexica), and the tools for lexical knowledge. Basic concepts of evaluation techniques are explained because the disambiguation method is thoroughly evaluated for three paradigmatic ambiguity problems in NLP using annotated corpora of German newspaper articles. The WOCADI parser is described in more detail because it is required - to different extents (phrase or sentence parsing) - for the disambiguation modules. Its architecture, some principles, and example results are presented. Before approaches for specific disambiguation problems are developed, the general approach is introduced. The solution is hybrid and combines interpretation rules expressing valuable linguistic knowledge and statistics on top of the applicability of these rules for annotated corpora. For the statistical part, a new statistical model is defined: multidimensional back-off models that extend the concept of traditional back-off models considerably. The three ambiguity problems, whose treatment together with the general disambiguation approach is the scientific center of the thesis, are prominent examples that span a whole range of disambiguation problems. First, the problem of prepositional phrase attachment and interpretation (a structural ambiguity) is tackled with an instantiation of the general disambiguation approach and the addition of bonus factors from syntax and semantics. The resulting PAIRUDIS system is evaluated on a corpus. Second, the problem of coreference resolution (a referential ambiguity) is described and a hybrid solution with rules and disambiguation statistics, the CORUDIS system, is proposed and evaluated. The evaluation includes additional bonus factors like syntactic or semantic parallelism of constituents. Third, the problem of word sense disambiguation (WSD; a lexical ambiguity) is treated for a subproblem, namely WSD for words in prepositional phrase contexts. The resulting system is evaluated and the improvements achieved for the PAIRUDIS system are determined. Finally, further ambiguity problems in NLP are briefly introduced. For the majority of them, a hybrid solution with a rule-centered multidimensional back-off model is sketched.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue