* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: Estimation of Strength of Association and Its Application to Structural Ambiguity Resolution
Author: Eduardo Alves
Email: click here to access email
Degree Awarded: University of Electro-Communications , Computer Science and Information Mathematics
Degree Date: 1999
Linguistic Subfield(s): Computational Linguistics
Syntax
Text/Corpus Linguistics
Subject Language(s): English
Japanese
Director(s): Teiji Furugori

Abstract:

Ambiguity resolution is a central issue in natural language processing. It is a necessary step, for instance, for devising robust natural language understanding or machine translation systems. We propose a corpus-based method to measure the strength of association between words in linguistic constructions and then apply it to deciding prepositional phrase attachments in English, determining the correct dependency structure in Japanese sentences, and resolving structural ambiguities in Japanese noun phrases containing the particle 'no'.

Essentially there are two methods to resolve structural ambiguities: rule-based and corpus-based. In the first method, preference rules applicable to the disambiguation tasks are derived from linguistic observations. In the second method, the preferences for disambiguation are obtained from statistical measures in large-scale corpora.

In this thesis we base our study on the statistical information depicted from the EDR Corpus and a conceptual dictionary. We use the corpus to get co-occurrence information between two or more words and then calculate the strength of association using mutual information. When the number of co-occurrences is zero or low, we use the conceptual dictionary and, using t-scores, substitute the words automatically with the best possible conceptual classes. By doing so, we avoid noises introduced by combining all classes, using unreliable classes or unrelated classes, and a priori clustering words into classes.

We verified the effectiveness of our method by applying the strength of association measure to three types of structural ambiguity resolutions. In the first experiment we attempted to determine the attachment for prepositional phrases in English. In the construction V+N+PP (verb-noun-prepositional phrase), the PP may attach to N or to V. Here, we employed the association measure to find the attachment of 500 ambiguous structures and achieved a success rate of 85.6%. The result is an improvement over other methods (59.6% to 79.5%), and is comparable to that of an experiment by human subjects.

In a second experiment we tried to resolve ambiguities in Japanese sentences. Due to the relative free word order in Japanese, it is quite difficult to determine the governor-dependent relations in a sentence. Here, for 75 constructions taken from sentences each containing an average of 8.68 probable structures, we obtained a success rate of 87.0%, a significant improvement over other methods whose success rates ranged from 70.6% to 82.6%.

In the last experiment, we attempted to find the correct structure for Japanese noun-phrases containing the particle 'no'. Here, for 429 'no' constructions, we obtained a success rate of 77.6%. Although this rate is not especially high, it is an improvement over the preformance for the same data of the experiments using other methods (72.7% to 73.2%).

The class-based association measure we proposed captures the relevant information effectively by selecting reliable data. It has generality and applicability, too, since it uses no rules or idiosyncratic processes. The method can be applicable to studying other linguistic phenomena than the syntactic ambiguities.
Add a dissertation
Update dissertation
Page Updated: 29-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.