LINGUIST List 32.3020

Fri Sep 24 2021

FYI: SemEval 2022 Shared Task 1 - CoDWoE: Comparing Dictionaries and Word Embeddings

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 23-Sep-2021
From: Timothee Mickus <tmickusatilf.fr>
Subject: SemEval 2022 Shared Task 1 - CoDWoE: Comparing Dictionaries and Word Embeddings
E-mail this message to a friend

[Apologies for cross-posting]

Do you work with text generation or word embeddings? We invite everyone to explore if a word embedding can be transformed into a short informative text (word definition, or gloss) and vice versa.

The CODWOE, Task 1 at SemEval 2022, aims to compare two types of semantic descriptions: dictionary definitions and word embedding representations. Are these two types of representation equivalent? Can we generate one from the other? To study this question, we propose two subtracks: a definition modeling track where participants have to generate definitions from vectors, and a reverse dictionary track where participants have to generate vectors from definitions. The tasks are available for English, Spanish, French, Italian and Russian.

These two tracks display a number of interesting characteristics. These tasks are obviously useful for explainable AI, since they involve converting human-readable data into machine-readable data and back. They also have a theoretical significance: both definitions and word embeddings are also representations of meaning, and therefore involve the conversion of distinct non-formal semantic representations. From a practical point of view, the ability to infer word-embeddings from dictionary resources, or dictionaries from large unannotated corpora, would prove a boon for many under-resourced languages.

Here are the key dates participants should keep in mind:
September 3, 2021: Training data & development data made available
January 10, 2022: Evaluation data made available & evaluation start
January 31, 2022: Evaluation end
February 23, 2022: Paper submission due
March 31, 2022: Notification to authors

To get started:
register on the codalab competition: https://competitions.codalab.org/competitions/34022
join the discord server : https://discord.gg/y8g6qXakNs
join the google group: send an email to semeval2022-dictionaries-and-word-embeddings+subscribegooglegroups.com
download the competition data and starter code: https://git.atilf.fr/tmickus/codwoe/-/tree/master/


Best regards,

CoDWoE organizers:
Timothee Mickus, ATILF, University of Lorraine / CNRS
Kees van Deemter, Utrecht University
Mathieu Constant, ATILF, University of Lorraine / CNRS
Denis Paperno, Utrecht University

Linguistic Field(s): Computational Linguistics; Semantics

Subject Language(s): English (eng)
                            French (fra)
                            Italian (ita)
                            Russian (rus)
                            Spanish (spa)


Page Updated: 24-Sep-2021