LINGUIST List 30.180

Sat Jan 12 2019

FYI: Call for Participation: BEA 2019 Shared Task

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 12-Jan-2019
From: Ekaterina Kochmar <bea.nlp.workshopgmail.com>
Subject: Call for Participation: BEA 2019 Shared Task
E-mail this message to a friend

Call for Participation:

BEA 2019 Shared Task:

Grammatical Error Correction

Florence, Italy

August 2, 2019

https://www.cl.cam.ac.uk/research/nl/bea2019st/


Grammatical error correction (GEC) is the task of automatically correcting grammatical errors in text; e.g. [I follows his advices -> I followed his advice]. One of the aims of this shared task is to once again provide a platform where different approaches can be trained and tested under the same conditions.

This shared task introduces the data from Write&Improve corpus, a new error-annotated dataset that represents a much more diverse cross-section of English language levels and domains. Write&Improve is an online web platform that assists non-native English students with their writing (https://writeandimprove.com/).

System output will be evaluated on a blind test set using ERRANT (https://github.com/chrisjbryant/errant).

In addition to learner data, we will provide an annotated development and test set extracted from the LOCNESS corpus, a collection of essays written by native English students compiled by the Centre for English Corpus Linguistics at the University of Louvain.


Tracks:

There are 3 tracks in the BEA 2019 shared task. Each track controls the amount of annotated data that can be used in a system. We place no restrictions on the amount of unannotated data that can be used (e.g. for language modelling).


Restricted:

In the restricted setting, participants may only use the following annotated datasets: FCE-train, Lang-8 Corpus of Learner English, NUCLE and Write&Improve.

Note that we restrict participants to the preprocessed Lang-8 Corpus of Learner English rather than the raw, multilingual Lang-8 Learner Corpus because participants would otherwise need to filter the raw corpus themselves.


Unrestricted:

In the unrestricted setting, participants may use any and all datasets, including those in the restricted setting.


Unsupervised (or minimally supervised):

In the unsupervised setting, participants may not use any annotated training data. Since current state-of-the-art systems rely on as much training data as possible to reach the best performance, the goal of the unsupervised track is to encourage research into systems that do not rely on annotated training data. This track should be of particular interest to researchers working with low-resource languages. Since we also expect this to be a challenging track however, we will allow participants to use the W&I development set to develop their systems.


Participation:

In order to participate in the BEA 2019 Shared Task, teams are required to submit their system output anytime up to Friday, March 29, 2019 at 23:59 GMT. There is no explicit registration procedure. Further details about the submission process will be provided soon.


Important Dates:

January 25, 2019: New training data released
March 25, 2019: New test data released
March 29, 2019: System output submission deadline
April 12, 2019: System results announced
May 3, 2019: System paper submission deadline
May 17, 2019: Review deadline
May 24, 2019: Notification of acceptance
June 7, 2019: Camera-ready submission deadline
August 2, 2019: BEA-2019 Workshop (Florence, Italy)

Organisers:

Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Contact:

Questions and queries about the shared task can be sent to bea2019stgmail.com.


Linguistic Field(s): Computational Linguistics


Page Updated: 12-Jan-2019