|Full Title:||Current Issues in Distributional Semantics|
|Short Title:||SemDis 2013|
|Location:||Sables d'Olonne, France|
|Start Date:||21-Jun-2013 - 21-Jun-2013|
|Meeting Email:||click here to access email|
SemDis 2013: Current Issues in Distributional Semantics
Workshop associated with the 20th TALN conference
June 21, 2013
Sables d’Olonne, France
In the course of the last two decades, significant progress has been made with regard to the automatic extraction of semantic knowledge from large-scale text corpora. Most work relies on Harris’ distributional hypothesis of meaning, which states that words that appear within the same contexts tend to be semantically related. This principle has inspired a substantial amount of research - mainly for English but also for other languages - and several survey articles have recently helped to consolidate the concepts and procedures used for distributional computations (Sahlgren, 2006; Turney and Pantel, 2010; Baroni and Lenci, 2010). In recent years, the distributional semantic approach has benefited from the availability of massive amounts of textual data and increased computational power, allowing for the application of these methods on a large scale. Still, a number of research topics remain open, with regard to the construction, the evaluation and the application of the semantic information that is induced by these methods.
Regarding the construction of distributional semantic resources, the nature of the corpus is a key issue, and its impact on the results requires further investigation. Today’s trend is to use massive corpora, moving away from Harris’ initial hypothesis which was based on the analysis of small, well-defined, and specialized corpora. A second important issue relates to the modeling of semantic compositionality within a distributional framework, such that not only individual words but also larger phrases can be taken into account (Mitchell et Lapata, 2008; Baroni & Zamparelli, 2010; Grefenstette & Sadrzadeh, 2011).
Relations between words tend to be very diverse. Regarding the evaluation of distributional models, we need a better understanding of the nature of semantic relations (synonymous, associative, analogous, etc.) induced by these models, and the impact of the distributional parameters on the induced relations (Sahlgren, 2006; Peirsman & Geeraerts, 2009). Secondly, large corpora generate resources so large that they are very difficult to explore and grasp. The manipulation of graphs within visualization systems suitable for their exploration can improve our knowledge on their content and structure.
Finally, distributional resources are useful for a large number of applications such as information retrieval, summarization, text segmentation, etc. Distributional features have been incorporated into a wide range of NLP tasks, such as named entity classification and paraphrasing (Kotlerman et al. 2010; Jonnalagadda et al. 2012). Linguists could equally benefit from these distributional approaches, as they provide a means to conduct large-scale studies of the semantic relations that may be discovered from large corpora.
|Linguistic Subfield:||Semantics; Computational Linguistics|
|Calls and Conferences main page|