LINGUIST List 32.3020
Fri Sep 24 2021
FYI: SemEval 2022 Shared Task 1 - CoDWoE: Comparing Dictionaries and Word Embeddings
Editor for this issue: Everett Green <everettlinguistlist.org>
Date: 23-Sep-2021
From: Timothee Mickus <tmickus
atilf.fr>
Subject: SemEval 2022 Shared Task 1 - CoDWoE: Comparing Dictionaries and Word Embeddings
E-mail this message to a friend [Apologies for cross-posting]
Do you work with text generation or word embeddings? We invite everyone to explore if a word embedding can be transformed into a short informative text (word definition, or gloss) and vice versa.
The CODWOE, Task 1 at SemEval 2022, aims to compare two types of semantic descriptions: dictionary definitions and word embedding representations. Are these two types of representation equivalent? Can we generate one from the other? To study this question, we propose two subtracks: a definition modeling track where participants have to generate definitions from vectors, and a reverse dictionary track where participants have to generate vectors from definitions. The tasks are available for English, Spanish, French, Italian and Russian.
These two tracks display a number of interesting characteristics. These tasks are obviously useful for explainable AI, since they involve converting human-readable data into machine-readable data and back. They also have a theoretical significance: both definitions and word embeddings are also representations of meaning, and therefore involve the conversion of distinct non-formal semantic representations. From a practical point of view, the ability to infer word-embeddings from dictionary resources, or dictionaries from large unannotated corpora, would prove a boon for many under-resourced languages.
Here are the key dates participants should keep in mind:
September 3, 2021: Training data & development data made available
January 10, 2022: Evaluation data made available & evaluation start
January 31, 2022: Evaluation end
February 23, 2022: Paper submission due
March 31, 2022: Notification to authors
To get started:
register on the codalab competition:
https://competitions.codalab.org/competitions/34022 join the discord server :
https://discord.gg/y8g6qXakNs join the google group: send an email to semeval2022-dictionaries-and-word-embeddings+subscribe
googlegroups.com
download the competition data and starter code:
https://git.atilf.fr/tmickus/codwoe/-/tree/master/ Best regards,
CoDWoE organizers:
Timothee Mickus, ATILF, University of Lorraine / CNRS
Kees van Deemter, Utrecht University
Mathieu Constant, ATILF, University of Lorraine / CNRS
Denis Paperno, Utrecht University
Linguistic Field(s): Computational Linguistics; Semantics
Subject Language(s):
English (eng) French (fra) Italian (ita) Russian (rus) Spanish (spa)
Page Updated: 24-Sep-2021