LINGUIST List 25.3316

Tue Aug 19 2014

FYI: New Dataset for Semantic Similarity Measurements

Editor for this issue: Uliana Kazagasheva <ulianalinguistlist.org>


Date: 19-Aug-2014
From: Felix Hill <felix.hillcl.cam.ac.uk>
Subject: New Dataset for Semantic Similarity Measurements
E-mail this message to a friend

We have just published a new dataset of 999 concept pairs rated by 500 annotators for semantic similarity (beer, ale), as distinct from relatedness (beer, drink).

It is intended to provide a challenging benchmark for the evaluation of representation and embedding-learning language models. It should also be of interest to psycholinguistics and cognitive scientists interested in representation and conceptual concreteness.

For more information, and to download the dataset, visit:
http://www.cl.cam.ac.uk/~fh295/simlex.html

Please cite the following paper if you use the dataset in your research:
Hill, F. Reichart, R. Korhonen, A. SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. 2014. Preprint published on arXiv. arXiv:1408:3456.


Linguistic Field(s): Cognitive Science; Computational Linguistics; Psycholinguistics; Semantics

Page Updated: 19-Aug-2014