LINGUIST List 21.2291
|
Thu May 20 2010
Diss: Comp Ling/Translation: Barreiro: 'Make It Simple...'
Editor for this issue: Mfon Udoinyang
<mfon linguistlist.org>
|
To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
|
Directory
1. Anabela
Barreiro,
Make It Simple with Paraphrases: Automated paraphrasing for authoring aids and machine translation
Message 1: Make It Simple with Paraphrases: Automated paraphrasing for authoring aids and machine translation
|
Date: 19-May-2010
From: Anabela Barreiro <barreiro_anabela hotmail.com>
Subject: Make It Simple with Paraphrases: Automated paraphrasing for authoring aids and machine translation
E-mail this message to a friend
Institution: Universidade do Porto
Program: Machine Translation
Dissertation Status: Completed
Degree Date: 2008
Author: Anabela Marques Barreiro
Dissertation Title: Make It Simple with Paraphrases: Automated paraphrasing for authoring aids and machine translation
Dissertation URL: http://www.linguateca.pt/Repositorio/AB-Thesis_030409.pdf
Linguistic Field(s):
Translation
Computational Linguistics
Translation
Subject Language(s): English (eng)
Portuguese (por)
Dissertation Director:
Belinda Maia
Adam Meyers
Dissertation Abstract:
This dissertation introduces a novel approach to improving machine translation by focusing on paraphrasing of support verb constructions. The challenge of the research was to paraphrase predicate nominal expressions such as fazer uma análise (to do an analysis) with predicate verbals, such as analisar (to analyse), applying language paraphrasing capabilities to produce better machine translation results. In particular cases, the paraphrasing consisted in replacing the semantically weak support verb of the predicate nominal construction with lexical-syntactic and stylistic variants, such as realizar uma análise or efectuar uma análise (to perform an analysis). When support verb constructions were identified and replaced with semantically equivalent or similar verbal expressions as a pre-processing step to translating, an average 21% improvement was observed in the evaluated quality of the results of Portuguese-English machine translation and, an average 31% improvement in the results of English-Portuguese machine translation. The research was based on a contrastive linguistic analysis of support verb constructions and of their paraphrases, which were organized in several syntactic-semantic subclasses according to the theoretical and methodological principles of the Lexicon-Grammar Theory, established in the Harrisian framework of Transformational Operator Grammar. This study looked into one particular category of multiword expression, support verb construction, but it was designed to be repeatable and extensible to other types of multiword expression, namely to idiomatic expressions such as dar o braço a torcer (to give up) and to syntactically free constructions, such as noun phrase coordination or the passive voice. All linguistic information was formalized in dictionaries and grammars developed with the NooJ linguistic environment. This linguistic information was explored for several natural language processing tasks, from both a monolingual and a bilingual perspective. The Portuguese-English bilingual resources of the open source Port4NooJ natural language processing system were built as groundwork for the study. They integrate the SAL ontology of the OpenLogos system. Based on Port4NooJ, automated paraphrasing software tools ReWriter and ParaMT were also created to re-write and translate support verb constructions. ReEscreve, the Portuguese version of ReWriter, is being used as an authoring aid online public service and its interface is described in this dissertation. The automated paraphrasing of support verb constructions through ReEscreve allows a 40% improvement of the quality of the machine translation results in that context.
Read more issues|LINGUIST home page|Top of issue
|
|
Page Updated: 20-May-2010
|
|
About LINGUIST
|
Contact Us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.
|
|