LINGUIST List 25.3316

Tue Aug 19 2014

FYI: New Dataset for Semantic Similarity Measurements

Editor for this issue: Uliana Kazagasheva <>

Date: 19-Aug-2014
From: Felix Hill <>
Subject: New Dataset for Semantic Similarity Measurements
E-mail this message to a friend

We have just published a new dataset of 999 concept pairs rated by 500 annotators for semantic similarity (beer, ale), as distinct from relatedness (beer, drink).

It is intended to provide a challenging benchmark for the evaluation of representation and embedding-learning language models. It should also be of interest to psycholinguistics and cognitive scientists interested in representation and conceptual concreteness.

For more information, and to download the dataset, visit:

Please cite the following paper if you use the dataset in your research:
Hill, F. Reichart, R. Korhonen, A. SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. 2014. Preprint published on arXiv. arXiv:1408:3456.

Linguistic Field(s): Cognitive Science; Computational Linguistics; Psycholinguistics; Semantics

Page Updated: 19-Aug-2014