Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Cognitive Literary Science

Edited by Michael Burke and Emily T. Troscianko

Cognitive Literary Science "Brings together researchers in cognitive-scientific fields and with literary backgrounds for a comprehensive look at cognition and literature."


New from Cambridge University Press!

ad

Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."


Academic Paper


Title: Extraction of multi-word expressions from small parallel corpora
Author: Yulia Tsvetkov
Institution: Language Technologies Institute Carnegie Mellon University
Author: Shuly Wintner
Institution: University of Haifa
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: We present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance.

CUP AT LINGUIST

This article appears IN Natural Language Engineering Vol. 18, Issue 4.

Return to TOC.

Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page