LINGUIST List 31.1243
Thu Apr 02 2020
Calls: Computational Linguistics/Spain
Editor for this issue: Lauren Perkins <laurenlinguistlist.org>
Luis Espinosa-Anke <espinosa-ankel
CAPITEL-EVAL 2020 E-mail this message to a friend
Full Title: CAPITEL-EVAL 2020
Short Title: CAPITEL
Date: 22-Sep-2020 - 25-Sep-2020
Location: Málaga, Spain
Contact Person: Jordi Porta
Meeting Email: < click here to access email >
Web Site: https://sites.google.com/view/capitel2020
Linguistic Field(s): Computational Linguistics
Call Deadline: 17-May-2020
Within the framework of the PlanTL, the Royal Spanish Academy (RAE) and the Secretariat of State for Digital Advancement (SEAD) of the Ministry of Economy signed an agreement for developing a linguistically annotated corpus of Spanish news articles, aimed at expanding the language resource infrastructure for the Spanish language. The name of such corpus is CAPITEL (Corpus del Plan de Impulso a las Tecnologías del Lenguaje}, and is composed of contemporary news articles thanks to agreements with a number of news media providers. CAPITEL has three levels of linguistic annotation: morphosyntactic (with lemmas and Universal Dependencies-style POS tags and features), syntactic (following Universal Dependencies v2), and named entities.
The linguistic annotation of a subset of the CAPITEL corpus has been revised using a machine-annotation-followed-by-human-revision procedure. Manual revision has been carried out by a team of graduated linguists using the Annotation Guidelines created specifically for CAPITEL. The named entity and syntactic layers of revised annotations comprise about 1 million words for the former, and roughly 250,000 for the latter. Due to the size of the corpus and the nature of the annotations, we propose two IberLEF sub-tasks under the more general, umbrella task of CAPITEL
IberLEF 2020, where we will use the revised subset of the CAPITEL corpus in two challenges, namely:
(1) Named Entity Recognition and Classification and
(2) Universal Dependency Parsing.
Because of the ever-evolving nature of the NLP field and its associated shared task competitions, we deem it relevant to propose new challenges for the Spanish language to determine whether recent developments can push the boundaries of the current state of the art.
Call for Participation:
Sub-task 1: Named Entity Recognition and Classification in Spanish News Articles:
Information extraction tasks, formalized in the late 1980s, are designed to evaluate systems which capture pieces of information present in free text, with the goal of enabling better and faster information and content access. One important set of such information are named entities (NE) which, roughly speaking, are textual elements corresponding to names of people, places, organizations and others. Three processes can be applied to NEs: recognition (or identification), categorization (assigning a type according to a predefined set of semantic categories), and linking (disambiguating the reference).
The aim of this sub-task is to challenge participants to apply their systems or solutions to the problem of identifying and classifying NEs in Spanish news articles. This two-stage process is referred to as NERC (Named Entity Recognition and Classification).
Sub-task 2: Universal Dependency Parsing of Spanish News Articles
Dependency-based syntactic parsing has become popular in NLP in recent years. One of the reasons for this popularity is the transparent encoding of predicate-argument structures, which is useful in many downstream applications. Another reason is that it is better suited than phrase-structure grammars for languages with free or flexible word order.
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) across different human languages. Moreover, the UD initiative is an open community effort with over 200 contributors which has produced more than 100 treebanks in over 70 languages.
The aim of this sub-task is to challenge participants to apply their systems or solutions to the problem of Universal Dependency parsing of Spanish news articles as defined in the Annotation Guidelines for the CAPITEL corpus that will be shared with the participants.
Please fill out the form on Codalab to register and submit results for NERC (https://competitions.codalab.org/competitions/23011
) or UD Parsing (https://competitions.codalab.org/competitions/23178
March 15: Sample set, Evaluation script and Annotation Guidelines released.
March 17: Training set released.
April 1: Development set released.
April 29: Test set released (includes background set).
May 17: Systems output submissions.
May 21: Results posted and Test set with GS annotations released.
May 31: Working notes paper submission.
June 15: Notification of acceptance (peer-reviews).
June 30: Camera ready paper submission.
September: IberLEF 2020 Workshop.
David Pérez Fernández, PlanTL - Ministry of Economy, Spain.
Jordi Porta-Zamorano, Centro de Estudios de la RAE, Spain.
José-Luis Sancho-Sánchez, Centro de Estudios de la RAE, Spain.
Rafael-J. Ureña-Ruiz, Centro de Estudios de la RAE, Spain.
Doaa Samy, Instituto de Ingeniería del Conocimiento (PlanTL-GTO), Spain.
Luis Espinosa-Anke, School of Computer Science and Informatics, Cardiff University, UK.
Contact: Jordi Porta-Zamorano (porta
Organizers mailing list: capitel2020org
Task-specific mailing lists:
Page Updated: 02-Apr-2020