LINGUIST List 30.1370

Wed Mar 27 2019

FYI: GermEval 2019 Task 1 - 2nd Call for Participation

Editor for this issue: Everett Green <everettlinguistlist.org>


***************** LINGUIST List Support *****************

Fund Drive 2019
29 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/




Date: 26-Mar-2019
From: Steffen Remus <remusinformatik.uni-hamburg.de>
Subject: GermEval 2019 Task 1 - 2nd Call for Participation
E-mail this message to a friend

GermEval 2019 Task 1 - Shared Task on hierarchical classification of German Blurbs (short texts)

2nd Call for Participation:

We invite interested parties to participate in this shared task. Further information can be found here: https://competitions.codalab.org/competitions/21226.

Hierarchical multi-label classification (HMC) of blurbs is the task of classifying multiple labels for short descriptive texts of books, where each label is part of an underlying hierarchy of categories. The increasing amount of available digital documents and the need for more and finer-grained categories calls for new, more robust and sophisticated text classification methods. Large datasets often incorporate a categorical hierarchy, which can be used to organize information of documents on different levels of specificity. Traditional multi-class text classification approaches are thoroughly researched, however, with the increase of available data and the necessity of more specific hierarchies and since traditional approaches fail to generalize adequately, the need for more robust and sophisticated classification methods increases.

With this task we aim to foster research within the HMC context. This task is focusing on classifying German books into their respective hierarchically structured writing genres using short advertisement texts (blurbs). The data contains further meta information such as author, page number, release date, etc.


Tasks:

This shared task consists of two subtask, described below. You can
participate in one of them, or in both.

- Subtask A: The task is to classify German books into one or multiple most general writing genres. Therfore, it can be considered a multi-label classification task. In total, there are 8 classes that can be assigned to a book: Literatur & Unterhaltung, Ratgeber, Kinderbuch & Jugendbuch, Sachbuch, Ganzheitliches Bewusstsein, Glaube & Ethik, Künste, Architektur & Garten.

- SubTask B: The second task targets hierarchical multi-label classification into multiple writing genres. In addition to the very general writing genres, additional genres of different specificity can be assigned to a book. In total, there are 343 different classes that are hierarchically structured on up to 4 levels.


Data:

The complete dataset for this task consists in total of 20,784 examples. Sample data is provided in order familiarize with the data structure. 14,548 training samples have been released and can be downloaded after registering for the shared tasks. We accept submissions for the validation set (2,079 samples) and publish a leaderboard via the codalab page. The final evaluation of the task will take place in July 2019, for this the true labels for the validation set will be provided as additional training data. More information can be found on the task's webpage at: https://competitions.codalab.org/competitions/21226


Important Dates:

- Jan 2019: Release of trial data
- Feb 01, 2019: Release of training data (train + validation)
- Jun 01, 2019: Release test data
- July 15, 2019: Final submission of test results
- July 31, 2019: Submission of description paper
- Aug, 2019: Workshop in Nürnberg/Erlangen, Germany at the Conference on Natural Language Processing KONVENS 2019 (https://dgfs.de/de/cl/konvens.html)


Organizers:

The task is organized by Rami Aly, Steffen Remus and Chris Biemann, Language Technology, Department of Informatics, Universität Hamburg.
https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html


GermEval:

GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. GermEval has been conducted four times since 2014 in co-location with KONVENS/GSCL conferences.


Linguistic Field(s): Applied Linguistics; Computational Linguistics; Semantics

Subject Language(s): German (deu)


Page Updated: 27-Mar-2019