LINGUIST List 30.4391

Tue Nov 19 2019

Diss: English; Hungarian; Spanish; Computational Linguistics; Semantics; Text/Corpus Linguistics: András Dobó: ''A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages''

Editor for this issue: Sarah Robinson <srobinsonlinguistlist.org>



Date: 17-Nov-2019
From: András Dobó <doboinf.u-szeged,hu>
Subject: A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages
E-mail this message to a friend

Institution: University of Szeged
Program: Doctoral School of Computer Science
Dissertation Status: Completed
Degree Date: 2019

Author: András Dobó

Dissertation Title: A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages

Dissertation URL: http://doktori.bibl.u-szeged.hu/10120/

Linguistic Field(s): Computational Linguistics
                            Semantics
                            Text/Corpus Linguistics

Subject Language(s): English (eng)
                            Hungarian (hun)
                            Spanish (spa)

Dissertation Director:
János Csirik

Dissertation Abstract:

Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages.


We would like to address this gap with our systematic study by searching for the best configuration in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages.


During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such configurations that significantly outperform conventional configurations and achieve state-of-the-art results.




Page Updated: 19-Nov-2019