LINGUIST List 16.295
Mon Jan 31 2005
Diss: Comp Ling/Syntax: Alves: 'Estimation of ...'
Editor for this issue: Takako Matsui <tako
linguistlist.org>
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
Directory
1. Eduardo
Alves,
Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution
Message 1: Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution
Date: 26-Jan-2005
From: Eduardo Alves <Alves.Eduardo
gmail.com>
Subject: Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution
Institution: University of Electro-Communications
Program: Computer Science and Information Mathematics
Dissertation Status: Completed
Degree Date: 1999
Author: Eduardo Alves
Dissertation Title: Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution
Linguistic Field(s):
Computational Linguistics
Syntax
Text/Corpus Linguistics
Subject Language(s): English (ENG)
Japanese (JPN)
Dissertation Director:
Teiji Furugori
Dissertation Abstract:
Ambiguity resolution is a central issue in natural language processing. It
is a necessary step, for instance, for devising robust natural language
understanding or machine translation systems. We propose a corpus-based
method to measure the strength of association between words in linguistic
constructions and then apply it to deciding prepositional phrase
attachments in English, determining the correct dependency structure in
Japanese sentences, and resolving structural ambiguities in Japanese noun
phrases containing the particle 'no'.
Essentially there are two methods to resolve structural ambiguities:
rule-based and corpus-based. In the first method, preference rules
applicable to the disambiguation tasks are derived from linguistic
observations. In the second method, the preferences for disambiguation are
obtained from statistical measures in large-scale corpora.
In this thesis we base our study on the statistical information depicted
from the EDR Corpus and a conceptual dictionary. We use the corpus to get
co-occurrence information between two or more words and then calculate the
strength of association using mutual information. When the number of
co-occurrences is zero or low, we use the conceptual dictionary and, using
t-scores, substitute the words automatically with the best possible
conceptual classes. By doing so, we avoid noises introduced by combining
all classes, using unreliable classes or unrelated classes, and a priori
clustering words into
classes.
We verified the effectiveness of our method by applying the strength of
association measure to three types of structural ambiguity resolutions. In
the first experiment we attempted to determine the attachment for
prepositional phrases in English. In the construction V+N+PP
(verb-noun-prepositional phrase), the PP may attach to N or to V. Here, we
employed the association measure to find the attachment of 500 ambiguous
structures and achieved a success rate of 85.6%. The result is an
improvement over other methods (59.6% to 79.5%), and is comparable to that
of an experiment by human subjects.
In a second experiment we tried to resolve ambiguities in Japanese
sentences. Due to the relative free word order in Japanese, it is quite
difficult to determine the governor-dependent relations in a sentence.
Here, for 75 constructions taken from sentences each containing an average
of 8.68 probable structures, we obtained a success rate of 87.0%, a
significant improvement over other methods whose success rates ranged from
70.6% to 82.6%.
In the last experiment, we attempted to find the correct structure for
Japanese noun-phrases containing the particle 'no'. Here, for 429 'no'
constructions, we obtained a success rate of 77.6%. Although this rate is
not especially high, it is an improvement over the preformance for the same
data of the experiments using other methods (72.7% to 73.2%).
The class-based association measure we proposed captures the relevant
information effectively by selecting reliable data. It has generality and
applicability, too, since it uses no rules or idiosyncratic processes. The
method can be applicable to studying other linguistic phenomena than the
syntactic ambiguities.
Respond to list|Read more issues|LINGUIST home page|Top of issue