Publishing Partner: Cambridge University Press CUP Extra Publisher Login

The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.

FYI: New Falko German Learner Corpus Release


Author: Marc Reznicek

Linguistic Field(s): Text/Corpus Linguistics
Language Acquisition

Subject Language(s): German

FYI Body: THE ERROR-ANNOTATED GERMAN LEARNER CORPUS FALKO HAS RELEASED A NEW
SUBCORPUS: FALKOESSAYL2WHIGV2.0 INCLUDING 195 ARGUMENTATIVE ESSAYS BY
ADVANCED LEARNERS OF GERMAN (117,189 TOKENS).

FOR EACH TEXT TWO FULL-TEXT TARGET HYPOTHESES (A MINIMAL MORPHOSYNTACTIC
NORMALIZATION AND AN EXTENDED SEMANTIC-PRAGMATIC VERSION) HAVE BEEN MANUALLY
ANNOTATED.

EACH REPRESENTATION HAS BEEN POS-TAGGED AND LEMMATIZED (TREETAGGER &
RFTAGGER). RFTAGGER MORPHOLOGICAL ANNOTATION HAS BEEN INTEGRATED AS WELL.

ON THIS BASIS, TAGS INDICATING DIFFERENCES BETWEEN THE LEARNER TEXT AND ITS
POS AND LEMMA ANNOTATIONS AND THE RESPECTIVE TARGET HYPOTHESES (POS & LEMMA)
HAVE BEEN ADDED.

THE CORPUS IS FREELY AVAILABLE UNDER THE FOLLOWING LINK:

HTTP://KORPLING.GERMAN.HU-BERLIN.DE/FALKO-SUCHE

THE ANNOTATION GUIDELINES CAN BE FOUND HERE:
HTTP://WWW.LINGUISTIK.HU-BERLIN.DE/INSTITUT/PROFESSUREN/KORPUSLINGUISTIK/FOR
SCHUNG/FALKO/FALKO-HANDBUCHV2.0.PDF