Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

The Vulgar Tongue: Green's History of Slang

By Jonathon Green

A comprehensive history of slang in the English speaking world by its leading lexicographer.


New from Cambridge University Press!

ad

The Universal Structure of Categories: Towards a Formal Typology

By Martina Wiltschko

This book presents a new theory of grammatical categories - the Universal Spine Hypothesis - and reinforces generative notions of Universal Grammar while accommodating insights from linguistic typology.


New from Brill!

ad

Brill's MyBook Program

Do you have access to Dynamics of Morphological Productivity through your library? Then you can by the paperback for only €25 or $25! Find out more about Brill's MyBook program!


Academic Paper


Title: WordICA—emergence of linguistic representations for words by independent component analysis
Author: Timo Honkela
Institution: Aalto University School of Science and Technology
Author: Aapo Hyvärinen
Institution: University of Helsinki
Author: Jaako J Väyrynen
Institution: Aalto University School of Science and Technology
Linguistic Field: Applied Linguistics; Computational Linguistics; Text/Corpus Linguistics
Abstract: We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 16, Issue 3, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page