LINGUIST List 26.3634

Thu Aug 13 2015

FYI: Release 10.0 of the TüBa-D/Z German Treebank

Editor for this issue: Ashley Parker <ashleylinguistlist.org>


Date: 13-Aug-2015
From: Marie Hinrichs <marie.hinrichsuni-tuebingen.de>
Subject: Release 10.0 of the TüBa-D/Z German Treebank
E-mail this message to a friend

The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce Release 10.0 of the TüBa-D/Z, a referentially and syntactically annotated German corpus.

The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung.' It currently comprises 3,644 newspaper articles (95,595 sentences; 1,787,801 tokens

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:

- inflectional morphology
- lemmas
- syntactic constituency
- grammatical functions
- (complex) named entities including semantic classification
- anaphora and coreference relations
- discourse connectives (explicit and implicit, partial coverage)
- GermaNet word senses
- dependency relations (automatically created)
- chunk annotation (automatically created)

New in this release:

- An additional 200 articles (10,237 sentences; 217,885 tokens) have been annotated.
- STYLEBOOK: The annotation stylebook has been updated and can be found on the webpage.
- Also included (since minor Release 9.1) are 17,910 manual annotations of a selected set of lemmas (30 nouns, 79 verbs) with their corresponding senses in the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation.

The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at:
http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html

Best Regards,
Erhard W. Hinrichs
Heike Telljohann
Marie Hinrichs
--
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19
72074 Tübingen
Germany

Linguistic Field(s): Computational Linguistics
Discourse Analysis
Morphology
Syntax
Text/Corpus Linguistics

Subject Language(s): German (deu)

Language Family(ies): Germanic

Page Updated: 13-Aug-2015