LINGUIST List 13.875

Fri Mar 29 2002

Diss: Computational Ling: Bond "Determiners..."

Editor for this issue: Karolina Owczarzak <karolinalinguistlist.org>


Directory

  1. bond, Computational Ling: Bond "Determiners and Number in English..."

Message 1: Computational Ling: Bond "Determiners and Number in English..."

Date: Fri, 29 Mar 2002 00:58:18 +0000
From: bond <bondcslab.kecl.ntt.co.jp>
Subject: Computational Ling: Bond "Determiners and Number in English..."


New Dissertation Abstract

Institution: University of Queensland
Program: Department of English
Dissertation Status: Completed
Degree Date: 2001

Author: Francis Charles Bond 

Dissertation Title: 
Determiners and Number in English contrasted with Japanese, as
exemplified in Machine Translation


Dissertation URL: 
http://www.kecl.ntt.co.jp/icl/mtg/members/bond/pubs/2001-phd.html

Linguistic Field: Translation, Computational Linguistics

Subject Language: Japanese, English

Dissertation Director 1: Roland Sussex
Dissertation Director 2: Rodney Huddleston


Dissertation Abstract:
 
The fact that concepts are grammaticalized differently in different
languages is a major problem for translation, especially for machine
translation. Two major examples of this are syntactic number, and the
use of (in)definite articles (a, some, the). In languages such as
English, nouns are marked for number and the choice of article (or of
no article) must be made for every noun phrase. In contrast, for
languages such as Japanese, number distinctions are not normally made,
and there are no articles. This means that whenever a noun phrase is
translated from Japanese to English, even if the denotation is
perfectly understood and a good translation equivalent found,
generating the noun phrase still requires two difficult choices:
should the head noun be singular or plural, and which article, if any,
should be generated.

This thesis proposes a semantic representation and a series of three
heuristic algorithms that make possible the appropriate generation of
articles and number when translating from Japanese to English. The
semantic representation provides a tractable set of features to
represent (1) the referential use of a noun phrase, as either
referential, generic, ascriptive or idiomatic; (2) the interpretation
of the noun phrase's referent as either a countable individual or a
mass, with seven detailed subtypes; (3) the definiteness of the noun
phrase, as either definite, indefinite, definite and extensive, or
possessed. The three algorithms automatically acquire values for these
features from the analysis of the Japanese text and the lexical
properties of the English translation equivalents, and then use them
to generate English. The first algorithm determines the referential
use of Japanese noun phrases, based on a defeasible hierarchy of
pragmatic rules that are applied top-down, from the clause to the noun
phrase. The second algorithm determines the appropriate interpretation
for English noun phrases, while the third determines which determiner,
if any, should be generated. These algorithms use rules based on the
different referential uses of the noun phrase. 

The proposed algorithms are implemented in a Japanese-to-English
machine translation system, and the detailed lexical information is
entered into its lexicon. The use of the algorithms improves the
percentage of noun phrases generated with correct use of articles and
number from 65% to 85%.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue