Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34168

Still Needed:

$40832

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

What is English? And Why Should We Care?

By: Tim William Machan

To find some answers Tim Machan explores the language's present and past, and looks ahead to its futures among the one and a half billion people who speak it. His search is fascinating and important, for definitions of English have influenced education and law in many countries and helped shape the identities of those who live in them.


New from Cambridge University Press!

ad

Medical Writing in Early Modern English

Edited by Irma Taavitsainen and Paivi Pahta

This volume provides a new perspective on the evolution of the special language of medicine, based on the electronic corpus of Early Modern English Medical Texts, containing over two million words of medical writing from 1500 to 1700.


Academic Paper


Title: Evaluating vector space models with canonical correlation analysis
Author: Sami Virpioja
Institution: Aalto University School of Science
Author: Mari-Sanna Paukkeri
Institution: Aalto University School of Science
Author: Abhishek Tripathi
Institution: University of Helsinki
Author: Tiina Lindh-Knuutila
Institution: Aalto University School of Science
Author: Krista Lagus
Institution: Aalto University School of Science
Linguistic Field: Computational Linguistics
Abstract: Vector space models are used in language processing applications for calculating semantic similarities of words or documents. The vector spaces are generated with feature extraction methods for text data. However, evaluation of the feature extraction methods may be difficult. Indirect evaluation in an application is often time-consuming and the results may not generalize to other applications, whereas direct evaluations that measure the amount of captured semantic information usually require human evaluators or annotated data sets. We propose a novel direct evaluation method based on canonical correlation analysis (CCA), the classical method for finding linear relationship between two data sets. In our setting, the two sets are parallel text documents in two languages. A good feature extraction method should provide representations that reflect the semantic content of the documents. Assuming that the underlying semantic content is independent of the language, we can study feature extraction methods that capture the content best by measuring dependence between the representations of a document and its translation. In the case of CCA, the applied measure of dependence is correlation. The evaluation method is based on unsupervised learning, it is language- and domain-independent, and it does not require additional resources besides a parallel corpus. In this paper, we demonstrate the evaluation method on a sentence-aligned parallel corpus. The method is validated by showing that the obtained results with bag-of-words representations are intuitive and agree well with the previous findings. Moreover, we examine the performance of the proposed evaluation method with indirect evaluation methods in simple sentence matching tasks, and a quantitative manual evaluation of word translations. The results of the proposed method correlate well with the results of the indirect and manual evaluations.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 18, Issue 3, which you can read on Cambridge's site .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page