Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Words in Time and Place: Exploring Language Through the Historical Thesaurus of the Oxford English Dictionary

By David Crystal

Offers a unique view of the English language and its development, and includes witty commentary and anecdotes along the way.


New from Cambridge University Press!

ad

Thesaurus of English Words and Phrases

By Peter Mark Roget

This book "supplies a vocabulary of English words and idiomatic phrases 'arranged … according to the ideas which they express'. The thesaurus, continually expanded and updated, has always remained in print, but this reissued first edition shows the impressive breadth of Roget's own knowledge and interests."


New from Brill!

ad

The Brill Dictionary of Ancient Greek

By Franco Montanari

Coming soon: The Brill Dictionary of Ancient Greek by Franco Montanari is the most comprehensive dictionary for Ancient Greek to English for the 21st Century. Order your copy now!


Academic Paper


Title: Linguistic Knowledge in Statistical Phrase-based Word Alignment
Linguistic Field: Computational Linguistics; Text/Corpus Linguistics
Abstract: In this paper, a novel phrase alignment strategy combining linguistic knowledge and co-occurrence measures extracted from bilingual corpora is presented. The algorithm is mainly divided into four steps, namely phrase selection and classification, phrase alignment, one-to-one word alignment and post-processing. The first stage selects a linguistically-derived set of phrases that convey a unified meaning during translation and are therefore aligned together in parallel texts. These phrases include verb phrases, idiomatic expressions and date expressions. During the second stage, very high precision links between these selected phrases for both languages are produced. The third step performs a statistical word alignment using association measures and link probabilities with the remaining unaligned tokens, and finally the fourth stage takes final decisions on unaligned tokens based on linguistic knowledge. Experiments are reported for an English-Spanish parallel corpus, with a detailed description of the evaluation measure and manual reference used. Results show that phrase co-occurrence measures convey a complementary information to word co-occurrences and a stronger evidence of a correct alignment, successfully introducing linguistic knowledge in a statistical word alignment scheme. Precision, Recall and Alignment Error Rate (AER) results are presented, outperforming state-of-the-art alignment algorithms.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 12, Issue 1, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page