Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Language Planning as a Sociolinguistic Experiment

By: Ernst Jahr

Provides richly detailed insight into the uniqueness of the Norwegian language development. Marks the 200th anniversary of the birth of the Norwegian nation following centuries of Danish rule


New from Cambridge University Press!

ad

Acquiring Phonology: A Cross-Generational Case-Study

By Neil Smith

The study also highlights the constructs of current linguistic theory, arguing for distinctive features and the notion 'onset' and against some of the claims of Optimality Theory and Usage-based accounts.


New from Brill!

ad

Language Production and Interpretation: Linguistics meets Cognition

By Henk Zeevat

The importance of Henk Zeevat's new monograph cannot be overstated. [...] I recommend it to anyone who combines interests in language, logic, and computation [...]. David Beaver, University of Texas at Austin


Academic Paper


Title: 'Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words'
Author: TorstenZesch
Institution: 'Technische Universität Darmstadt'
Author: IrynaGurevych
Institution: 'Technische Universität Darmstadt'
Linguistic Field: 'Computational Linguistics; Semantics; Text/Corpus Linguistics'
Subject Language: 'English'
' German'
Abstract: 'In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia).
The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.

CUP at LINGUIST

This article appears in Natural Language Engineering Vol. 16, Issue 1, which you can read on Cambridge's site or on LINGUIST .



Back
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page