* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: A Syntax-Based Statistical Translation Model
Author: Kenji Yamada
Email: click here to access email
Homepage: http://groups.google.com/group/kenji-yamada-publications/?pli=1
Degree Awarded: University of Southern California , Information Sciences Institute
Degree Date: 2002
Linguistic Subfield(s): Computational Linguistics
Subject Language(s): Chinese, Mandarin
English
Japanese
Director(s): Kevin Knight
Daniel Marcu
Paul Rosenbloom
Eduard Hovy

Abstract:

A statistical translation model is a mathematical model for the process of human-language translation. Model parameters are automatically estimated using a corpus of translation pairs. This is in contrast to conventional rule-based machine translation systems, in which lexical, syntactic, and semantic translation rules are manually crafted by language experts over several years.

The idea of statistical machine translation was first seen in the late 1940's, but the computational power at that time was not sufficient. In the last decade, word-to-word statistical translation models regained researchers' interest, due to increasing computational power and growing volume of online training materials.

This thesis introduces a more advanced statistical translation model that better exploits such growing resources. Most statistical translation models are based on word-to-word translations, i.e., the operation in a model works on each word independently. We present a new model that translates a syntactic parse tree into a foreign language sentence, in which the model operations work on each node of the syntactic parse tree. To obtain a syntactic parse tree, we use an existing parser developed elsewhere. This is to take advantage of using available linguistic resources in a statistical framework. By using a syntactic parser, we are able to use rich syntactic information embedded in a sentence, and we are able to model more linguistically-motivated word movements in language translations. We use a parser only for the channel input, so that our model works for translations from any linguistically resource-poor language to a resource-rich language such as English.

We have developed an efficient training algorithm and an experimental decoding program for the syntax-based translation model. We demonstrate that the alignment accuracy for Japanese-English is more than 30% better in our model compared to previous word-to-word models, and demonstrate that the decoding performance is 10-40% better for Chinese-English and Arabic-English translations.
Add a dissertation
Update dissertation
Page Updated: 28-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.