Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

A History of the Irish Language: From the Norman Invasion to Independence

By Aidan Doyle

This book "sets the history of the Irish language in its political and cultural context" and "makes available for the first time material that has previously been inaccessible to non-Irish speakers."


New from Cambridge University Press!

ad

The Cambridge Handbook of Pragmatics

Edited By Keith Allan and Kasia M. Jaszczolt

This book "fills the unquestionable need for a comprehensive and up-to-date handbook on the fast-developing field of pragmatics" and "includes contributions from many of the principal figures in a wide variety of fields of pragmatic research as well as some up-and-coming pragmatists."


Academic Paper


Title: WordICA—emergence of linguistic representations for words by independent component analysis
Author: Timo Honkela
Institution: Aalto University School of Science and Technology
Author: Aapo Hyvärinen
Institution: University of Helsinki
Author: Jaako J Väyrynen
Institution: Aalto University School of Science and Technology
Linguistic Field: Applied Linguistics; Computational Linguistics; Text/Corpus Linguistics
Abstract: We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications.

CUP AT LINGUIST

This article appears IN Natural Language Engineering Vol. 16, Issue 3, which you can READ on Cambridge's site or on LINGUIST .



Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page