LINGUIST List 14.17

Tue Jan 7 2003

Diss: Computational Ling: Davis "Stone Soup..."

Editor for this issue: Karolina Owczarzak <karolinalinguistlist.org>


Directory

  1. pcdavis, Computational Ling: Davis "Stone Soup Translation..."

Message 1: Computational Ling: Davis "Stone Soup Translation..."

Date: Wed, 01 Jan 2003 15:05:17 +0000
From: pcdavis <pcdavisjulius.ling.ohio-state.edu>
Subject: Computational Ling: Davis "Stone Soup Translation..."



New Dissertation Abstract

Institution: Ohio State University
Program: Department of Linguistics
Dissertation Status: Completed
Degree Date: 2002

Author: Paul C. Davis 

Dissertation Title: 
Stone Soup Translation: The Linked Automata Model

Dissertation URL: http://www.ling.ohio-state.edu/~pcdavis/papers/diss.html

Linguistic Field: Computational Linguistics

Dissertation Director 1: Chris Brew
Dissertation Director 2: Detmar Meurers
Dissertation Director 3: Robert Kasper
Dissertation Director 4: Erhard Hinrichs


Dissertation Abstract: 

The automated translation of one natural language to another, known as
machine translation (MT), typically requires successful modeling of
the grammars of the languages and the relationship between
them. Rather than hand-coding these grammars and relationships, some
machine translation efforts employ data-driven methods, where the goal
is to learn from a large amount of training examples of accurate
translations. One such data-driven approach is statistical MT, where
language and alignment models are automatically induced from parallel
corpora. This work has also been extended to probabilistic
finite-state approaches, most often via transducers.

This dissertation introduces and begins an investigation of an MT
model consisting of a novel combination finite-state devices. The
model proposed is more flexible than transducer models, giving
increased ability to handle word order differences between languages,
as well as crossing and discontinuous alignments between words. The
linked automata MT model consists of a source language automaton, a
target language automaton, and an alignment table---a function which
probabilistically links sequences of source and target language
transitions. It is this augmentation to the finite-state base which
gives the linked automata model its flexibility.

The dissertation describes the linked automata model from the ground
up, beginning with a description of some of the relevant MT history
and empirical MT literature, and the preparatory steps for building
the model, including a detailed discussion of word alignment and the
introduction of a new technique for word alignment
evaluation. Discussion then centers on the description of the model
and its use of probabilities, including algorithms for its
construction from word-aligned bitexts and for the translation
process. The focus next moves to expanding the linked automata
approach, first through generalization and techniques for extracting
partial results, and then by increasing the coverage, both in terms of
using additional linguistic information and using more complex
alignments. The dissertation presents preliminary results for a test
corpus of English to Spanish translations, and suggests ways in which
the model can be further expanded as the foundation of a more powerful
MT system.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue