Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Oxford University Press!

ad

Style, Mediation, and Change

Edited by Janus Mortensen, Nikolas Coupland, and Jacob Thogersen

Style, Mediation, and Change "Offers a coherent view of style as a unifying concept for the sociolinguistics of talking media."


New from Cambridge University Press!

ad

Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."


The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported by your donations. Please support LINGUIST List during the 2017 Fund Drive.

Academic Paper


Title: Extraction of multi-word expressions from small parallel corpora
Author: Yulia Tsvetkov
Institution: Language Technologies Institute Carnegie Mellon University
Author: Shuly Wintner
Institution: University of Haifa
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: We present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance.

CUP AT LINGUIST

This article appears IN Natural Language Engineering Vol. 18, Issue 4.

Return to TOC.

Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page