* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *

LINGUIST List 23.3230

Mon Jul 30 2012

Diss: Comp Ling/ English/ German/ Romanian: Gavrila: 'Improving Recombination...'

Editor for this issue: Lili Xia <lxialinguistlist.org>

Date: 30-Jul-2012
From: Monica Gavrila <gavrilainformatik.uni-hamburg.de>
Subject: Improving Recombination in a Linear EBMT System by Use of Constraints
E-mail this message to a friend

Institution: Universit├Ąt Hamburg
Program: Department of Informatics
Dissertation Status: Completed
Degree Date: 2012

Author: Monica Gavrila

Dissertation Title: Improving Recombination in a Linear EBMT System by Use of Constraints

Dissertation URL: http://ediss.sub.uni-hamburg.de/volltexte/2012/5758/

Linguistic Field(s): Computational Linguistics

Subject Language(s): English (eng)
                            German (deu)
                            Romanian (ron)

Dissertation Director:
Walther von Hahn
David Farwell
Wolfgang Menzel

Dissertation Abstract:

(Automatic) machine translation (MT) is one of the most challenging
domains in Natural Language Processing (NLP) and plays an important
role in ensuring global communication, especially in a multilingual world
with access to large amounts of Internet resources. As rule-based MT
approaches need manually developed resources, new MT directions
have been developed over the last twenty years, such as corpus-
based machine translation (CBMT): statistical MT (SMT) and example-
based machine translation (EBMT). These new directions are based
mainly on the existence of a parallel aligned corpus and, therefore, can
be easily employed for lower-resourced languages.

In this dissertation we showed how EBMT systems behave when a
lower-resourced inflecting language (i.e. Romanian) is involved in the
translation process. For this purpose we built an EBMT baseline
system based only on surface forms (the Lin-EBMT system). One of
our main goals was to investigate the impact of word-order constraints
on the translation results: we integrated constraints extracted from
generalized examples (i.e. templates) in Lin-EBMT and built an
extended system: Lin- EBMTREC+. Although constraints represent a
well-known method which is employed quite often in NLP, the use of
word-order constraints in an EBMT system is an innovative approach
which can open new paths in the domain of example-based MT. We
run our experiments for two language-pairs in both directions of
translation: Romanian-German and Romanian-English. This aspect
raises interesting questions, as Romanian and German present
language specific characteristics, which make the translation process
even more challenging. Both EBMT systems developed are easily
adaptable for other language-pairs. They are platform and language-
pair independent, provided that a parallel aligned corpus for the
language-pair exists and that the tools used for obtaining the needed
intermediate information (e.g. word alignment) are available. As a side
question, we studied how EBMT reacts in comparison to SMT. We
compared the EBMT results obtained to results provided by a Moses-
based SMT system and the Google Translate on-line system. To
provide a complete view on CBMT, the performance of each MT
system was assessed in several experimental settings, using different
corpora (type and size), various system settings and additional part-of-
speech (POS) information. We evaluated the translation results by
means of three automatic evaluation metrics: BLEU, NIST and TER. A
subset of the results was manually analyzed for a better overview on
the translation quality.

Our experiments showed that constraints improve translation results,
although a clear decision which constraint-combination works best
could not be taken. Although the SMT system outperformed the EBMT
system in all experiments, the manual analysis provided cases in which
EBMT offered more accurate results. The behavior of the systems
while changing the experimental settings confirmed that (training and
test) data have a substantial impact on both MT approaches. The
difference between the results of the two MT approaches decreased
when a more restricted corpus was used. As expected, both CBMT
approaches worked better for shorter sentences.

Read more issues|LINGUIST home page|Top of issue

Page Updated: 30-Jul-2012

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.