LINGUIST List 15.1980

Fri Jul 2 2004

Diss: Comp Ling: Chung: 'Statistical Korean...'

Editor for this issue: Takako Matsui <>


  1. hjchung, Statistical Korean Dependency Parsing Model...

Message 1: Statistical Korean Dependency Parsing Model...

Date: Fri, 2 Jul 2004 03:49:53 -0400 (EDT)
From: hjchung <>
Subject: Statistical Korean Dependency Parsing Model...

Institution: Korea University
Program: Department of Computer Science and Engineering
Dissertation Status: Completed
Degree Date: 2004

Author: Hoojung Chung 

Dissertation Title: Statistical Korean Dependency Parsing Model based
on the Surface Contextual Information

Dissertation URL:

Linguistic Field: Computational Linguistics, Text/Corpus Linguistics

Subject Language: Korean (code: KKN)

Dissertation Director 1: Hae-Chang Rim

Dissertation Abstract: 

Natural language parsing is a key problem to many tasks that require
natural language processing. Many language-processing tasks use the
information on predicate-argument relation or modifier-modifyee
relation. Parsing makes this possible by identifying relations between
words, or phrases in sentences. However, it is difficult to parse a
sentence correctly, because of the ambiguity inherent in the natural
language. During the last decade, the statistical approach becomes the
major trends in natural language parsing. The two most important
things in the statistical natural language parsing is selecting
appropriate features that help syntactic disambiguation and designing
a statistical model using them.

This dissertation argues that the influence of modification distance
and local context in parsing the Korean language. In the proposed
parsing model, preference for a modification distance in a certain
local context is considered in addition to the preference for lexical
bigram dependency. All of these preferences are expressed by
probabilities conditioned on local context. The parsing model is based
on the dependency theory, which is widely known as an adequate
formalism to reflect the syntactic characteristic of the Korean
language, or other variable word-order languages.

The statistical dependency parsing model consists of two
probabilities, which are the lexical dependency probability and the
modification distance probability; the lexical dependency probability
reflects selectional preference and the preference on each dependency
rules. The modification distance probability reflects the preferred
length of a dependency relation from a certain modifier based on the
context of the modifier.

We believe the parameterization of the parsing model for a language
should be done with the deliberation of the characteristics of the
language. The probability on modification distance is designed to
consider the property of variable-word-order language, which includes
Korean, and this is a new way to reflect the distance between two
depending words. Evaluation on KAIST treebank text shows that the
proposed model recovered dependency relations with 86.75\%
$F_1$-score. The consideration of the modification distance and local
context helps selecting correct modifyee of modifier even in
variable-word-order language, and the proposed way to deal the
distance in the parsing model outperforms other methods dealing the
distance in the statistical model.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue