Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Speaking American: A History of English in the United States

By Richard W. Bailey

"Takes a novel approach to the history of American English by focusing on hotbeds of linguistic activity throughout American history."


New from Cambridge University Press!

ad

Language, Literacy, and Technology

By Richard Kern

"In this book, Richard Kern explores how technology matters to language and the ways in which we use it. Kern reveals how material, social and individual resources interact in the design of textual meaning, and how that interaction plays out across contexts of communication, different situations of technological mediation, and different moments in time."


Academic Paper


Title: A unified alignment algorithm for bilingual data
Author: Christoph Tillmann
Institution: IBM T.J. Watson Research Center
Author: Sanjika Hewavitharana
Institution: Carnegie Mellon University
Linguistic Field: Computational Linguistics
Abstract: The paper presents a novel unified algorithm for aligning sentences with their translations in bilingual data. With the help of ideas from a stack-based dynamic programming decoder for speech recognition (Ney 1984), the search is parametrized in a novel way such that the unified algorithm can be used on various types of data that have been previously handled by separate implementations: the extracted text chunk pairs can be either sub-sentential pairs, one-to-one, or many-to-many sentence-level pairs. The one-stage search algorithm is carried out in a single run over the data. Its memory requirements are independent of the length of the source document, and it is applicable to sentence-level parallel as well as comparable data. With the help of a unified beam-search candidate pruning, the algorithm is very efficient: it avoids any document-level pre-filtering and uses less restrictive sentence-level filtering. Results are presented on a Russian–English, a Spanish–English, and an Arabic–English extraction task. Based on simple word-based scoring features, text chunk pairs are extracted out of several trillion candidates, where the search is carried out on 300 processors in parallel.

CUP AT LINGUIST

This article appears IN Natural Language Engineering Vol. 19, Issue 1, which you can READ on Cambridge's site or on LINGUIST .



Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page