LINGUIST List 23.3230
Mon Jul 30 2012
Diss: Comp Ling/ English/ German/ Romanian: Gavrila: 'Improving Recombination...'
Editor for this issue: Lili Xia
Monica Gavrila <gavrila
Improving Recombination in a Linear EBMT System by Use of Constraints
E-mail this message to a friend
Institution: Universität Hamburg
Program: Department of Informatics
Dissertation Status: Completed
Degree Date: 2012
Author: Monica Gavrila
Dissertation Title: Improving Recombination in a Linear EBMT System by Use of Constraints
Dissertation URL: http://ediss.sub.uni-hamburg.de/volltexte/2012/5758/
Subject Language(s): English (eng)
Walther von Hahn
(Automatic) machine translation (MT) is one of the most challengingdomains in Natural Language Processing (NLP) and plays an importantrole in ensuring global communication, especially in a multilingual worldwith access to large amounts of Internet resources. As rule-based MTapproaches need manually developed resources, new MT directionshave been developed over the last twenty years, such as corpus-based machine translation (CBMT): statistical MT (SMT) and example-based machine translation (EBMT). These new directions are basedmainly on the existence of a parallel aligned corpus and, therefore, canbe easily employed for lower-resourced languages.
In this dissertation we showed how EBMT systems behave when alower-resourced inflecting language (i.e. Romanian) is involved in thetranslation process. For this purpose we built an EBMT baselinesystem based only on surface forms (the Lin-EBMT system). One ofour main goals was to investigate the impact of word-order constraintson the translation results: we integrated constraints extracted fromgeneralized examples (i.e. templates) in Lin-EBMT and built anextended system: Lin- EBMTREC+. Although constraints represent awell-known method which is employed quite often in NLP, the use ofword-order constraints in an EBMT system is an innovative approachwhich can open new paths in the domain of example-based MT. Werun our experiments for two language-pairs in both directions oftranslation: Romanian-German and Romanian-English. This aspectraises interesting questions, as Romanian and German presentlanguage specific characteristics, which make the translation processeven more challenging. Both EBMT systems developed are easilyadaptable for other language-pairs. They are platform and language-pair independent, provided that a parallel aligned corpus for thelanguage-pair exists and that the tools used for obtaining the neededintermediate information (e.g. word alignment) are available. As a sidequestion, we studied how EBMT reacts in comparison to SMT. Wecompared the EBMT results obtained to results provided by a Moses-based SMT system and the Google Translate on-line system. Toprovide a complete view on CBMT, the performance of each MTsystem was assessed in several experimental settings, using differentcorpora (type and size), various system settings and additional part-of-speech (POS) information. We evaluated the translation results bymeans of three automatic evaluation metrics: BLEU, NIST and TER. Asubset of the results was manually analyzed for a better overview onthe translation quality.
Our experiments showed that constraints improve translation results,although a clear decision which constraint-combination works bestcould not be taken. Although the SMT system outperformed the EBMTsystem in all experiments, the manual analysis provided cases in whichEBMT offered more accurate results. The behavior of the systemswhile changing the experimental settings confirmed that (training andtest) data have a substantial impact on both MT approaches. Thedifference between the results of the two MT approaches decreasedwhen a more restricted corpus was used. As expected, both CBMTapproaches worked better for shorter sentences.
Page Updated: 30-Jul-2012