LINGUIST List 25.5201

Fri Dec 19 2014

FYI: New Release of the TüBa-D/Z German Treebank

Editor for this issue: Uliana Kazagasheva <ulianalinguistlist.org>


Date: 19-Dec-2014
From: Marie Hinrichs <marie.hinrichsuni-tuebingen.de>
Subject: New Release of the TüBa-D/Z German Treebank
E-mail this message to a friend

The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce a new minor release of its referentially and syntactically annotated German corpus: The Tübingen Treebank of Written German (TüBa-D/Z) - Release 9.1.

The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of the 'die tageszeitung'. It currently comprises 85,358 sentences (1,569,916 words; 3,444 newspaper articles).

This minor release includes 17,910 manual annotations of a selected set of lemmas (30 nouns, 79 verbs) with their corresponding senses in the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation. Please note that no new sentences have been added between release 9.0 and release 9.1. Only those formats that support word sense annotation are part of this minor release (Negra Export 3 and 4, CoNLL 2011/2012, Export XML). Other formats remain unchanged and can be obtained from release 9.0.

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:
- inflectional morphology
- lemmas
- syntactic constituency
- grammatical functions
- (complex) named entities including semantic classification
- anaphora and coreference relations
- discourse connectives (explicit and implicit, partial coverage)
- GermaNet word senses
- dependency relations (automatically created)
- chunk annotation (automatically created)

The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html

Best Regards,

Erhard W. Hinrichs
Heike Telljohann
Marie Hinrichs
------------
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19
72074 Tübingen
Germany


Linguistic Field(s): Computational Linguistics; Discourse Analysis; Morphology; Syntax; Text/Corpus Linguistics

Subject Language(s): German (deu)
Language Family(ies): Germanic

Page Updated: 19-Dec-2014