Academic Paper |
|
|
|
|
| Title: | WordICA—emergence of linguistic representations for words by independent component analysis |
| Author: | Timo Honkela |
| Institution: | Aalto University School of Science and Technology |
| Author: | Aapo Hyvärinen |
| Institution: | University of Helsinki |
| Author: | Jaako J Väyrynen |
| Institution: | Aalto University School of Science and Technology |
| Linguistic Field: | Applied Linguistics; Computational Linguistics; Text/Corpus Linguistics |
| Abstract: | We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications. |
|
|
|
|
This article appears in Natural Language Engineering Vol. 16, Issue 3, which you can read on Cambridge's site or on LINGUIST . |
|
|
|
|
Back
Add a new paper Return to Academic Papers main page Return to Directory of Linguists main page |
|


