* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 20.4320

Tue Dec 15 2009

FYI: New Release of TüBa-D/Z, Version 5.0

Editor for this issue: Danielle St. Jean <daniellelinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Kathrin Beck, New Release of TüBa-D/Z, Version 5.0

Message 1: New Release of TüBa-D/Z, Version 5.0
Date: 30-Nov-2009
From: Kathrin Beck <kathrin.beckuni-tuebingen.de>
Subject: New Release of TüBa-D/Z, Version 5.0
E-mail this message to a friend

The Department of Linguistics at the University of Tübingen, Germany is
happy to announce a new release (version 5.0) of the 'Tübingen Treebank of
Written German' (TüBa-D/Z).

The TüBa-D/Z Treebank is a manually annotated German newspaper corpus
based on data taken from the daily issues of the 'die tageszeitung.' Apart
from syntactic annotation, it also includes coreference annotation at the
NP level.

It currently comprises:
- 794,079 tokens
- 45,200 sentences
- 2,213 newspaper articles

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels
of syntactic constituency:
- the lexical level
- the phrasal level
- the level of topological fields
- the clausal level
- In addition to constituent structure, annotated trees contain edge labels
with grammatical functions.

All words are annotated with:
- inflectional morphology at the lexical level
- POS tags

All newspaper articles of the treebank have been enriched with anaphoric
and coreference relations referring to nominal and pronominal antecedents.
Linking relations include:
- coreferential (two NPs refer to the same extralinguistic referent)
- anaphoric/cataphoric (a definite pronoun refers to a contextual
antecedent)
- and other relations (split-antecedent, instance)
- as well as marking of inherent reflexive pronouns and expletive pronouns.

The treebank is available in 3 different formats:
- NEGRA export format
- XML format
- Penn Treebank format
- joint syntactic and referential annotation is available in the Export
and ExportXML formats

What is new in the fifth release:
- about 9,000 additional sentences
- about 500 more articles with referential annotation
- cleaner versions of the trees published in the fourth release
- the entire referential annotation has been checked and revised

The license for TüBa-D/Z is granted free of charge for academic use. For
more information, please refer to:
http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml
http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml

With the best regards,

Prof. Dr. Erhard W. Hinrichs
Kathrin Beck
Heike Telljohann
Yannick Versley

Linguistic Field(s): Computational Linguistics; Discourse Analysis; Syntax; Text/Corpus Linguistics

Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.