Editor for this issue: Takako Matsui <tako
linguistlist.org>
Institution: Korea University Program: Department of Computer Science and Engineering Dissertation Status: Completed Degree Date: 2004 Author: Hoojung Chung Dissertation Title: Statistical Korean Dependency Parsing Model based on the Surface Contextual Information Dissertation URL: http://nlp.korea.ac.kr/~hjchung/mypaper/hjchungdissertation.pdf Linguistic Field: Computational Linguistics, Text/Corpus Linguistics Subject Language: Korean (code: KKN) Dissertation Director 1: Hae-Chang Rim Dissertation Abstract: Natural language parsing is a key problem to many tasks that require natural language processing. Many language-processing tasks use the information on predicate-argument relation or modifier-modifyee relation. Parsing makes this possible by identifying relations between words, or phrases in sentences. However, it is difficult to parse a sentence correctly, because of the ambiguity inherent in the natural language. During the last decade, the statistical approach becomes the major trends in natural language parsing. The two most important things in the statistical natural language parsing is selecting appropriate features that help syntactic disambiguation and designing a statistical model using them. This dissertation argues that the influence of modification distance and local context in parsing the Korean language. In the proposed parsing model, preference for a modification distance in a certain local context is considered in addition to the preference for lexical bigram dependency. All of these preferences are expressed by probabilities conditioned on local context. The parsing model is based on the dependency theory, which is widely known as an adequate formalism to reflect the syntactic characteristic of the Korean language, or other variable word-order languages. The statistical dependency parsing model consists of two probabilities, which are the lexical dependency probability and the modification distance probability; the lexical dependency probability reflects selectional preference and the preference on each dependency rules. The modification distance probability reflects the preferred length of a dependency relation from a certain modifier based on the context of the modifier. We believe the parameterization of the parsing model for a language should be done with the deliberation of the characteristics of the language. The probability on modification distance is designed to consider the property of variable-word-order language, which includes Korean, and this is a new way to reflect the distance between two depending words. Evaluation on KAIST treebank text shows that the proposed model recovered dependency relations with 86.75\% $F_1$-score. The consideration of the modification distance and local context helps selecting correct modifyee of modifier even in variable-word-order language, and the proposed way to deal the distance in the parsing model outperforms other methods dealing the distance in the statistical model.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue