LINGUIST List 20.4320
|
Tue Dec 15 2009
FYI: New Release of TüBa-D/Z, Version 5.0
Editor for this issue: Danielle St. Jean
<danielle linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Kathrin
Beck,
New Release of TüBa-D/Z, Version 5.0
Message 1: New Release of TüBa-D/Z, Version 5.0
|
Date: 30-Nov-2009
From: Kathrin Beck <kathrin.beck uni-tuebingen.de>
Subject: New Release of TüBa-D/Z, Version 5.0
E-mail this message to a friend
The Department of Linguistics at the University of Tübingen, Germany is happy to announce a new release (version 5.0) of the 'Tübingen Treebank of Written German' (TüBa-D/Z). The TüBa-D/Z Treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of the 'die tageszeitung.' Apart from syntactic annotation, it also includes coreference annotation at the NP level. It currently comprises: - 794,079 tokens - 45,200 sentences - 2,213 newspaper articles The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency: - the lexical level - the phrasal level - the level of topological fields - the clausal level - In addition to constituent structure, annotated trees contain edge labels with grammatical functions. All words are annotated with: - inflectional morphology at the lexical level - POS tags All newspaper articles of the treebank have been enriched with anaphoric and coreference relations referring to nominal and pronominal antecedents. Linking relations include: - coreferential (two NPs refer to the same extralinguistic referent) - anaphoric/cataphoric (a definite pronoun refers to a contextual antecedent) - and other relations (split-antecedent, instance) - as well as marking of inherent reflexive pronouns and expletive pronouns. The treebank is available in 3 different formats: - NEGRA export format - XML format - Penn Treebank format - joint syntactic and referential annotation is available in the Export and ExportXML formats What is new in the fifth release: - about 9,000 additional sentences - about 500 more articles with referential annotation - cleaner versions of the trees published in the fourth release - the entire referential annotation has been checked and revised The license for TüBa-D/Z is granted free of charge for academic use. For more information, please refer to: http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml With the best regards, Prof. Dr. Erhard W. Hinrichs Kathrin Beck Heike Telljohann Yannick Versley
Linguistic Field(s): Computational Linguistics; Discourse Analysis; Syntax; Text/Corpus Linguistics
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|