* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 17.2061

Fri Jul 14 2006

FYI: German Treebank - Now with Anaphora

Editor for this issue: Kevin Burrows <kevinlinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Yannick Versley, German Treebank - Now with Anaphora


Message 1: German Treebank - Now with Anaphora
Date: 14-Jul-2006
From: Yannick Versley <versleysfs.uni-tuebingen.de>
Subject: German Treebank - Now with Anaphora


The Division of Computational Linguistics at the Seminar fuer
Sprachwissenschaft of the University of Tuebingen (Germany) is happy to
announce the release a referentially and syntactically annotated German corpus:

- The Tuebingen Treebank of Written German (TueBa-D/Z) - third release

The TueBa-D/Z treebank is a manually annotated German newspaper
corpus based on data taken from the daily issues of the 'die tageszeitung'.
It currently comprises approximately 27 000 sentences (ca. 470 000 words).

The syntactic annotation scheme of the TueBa-D/Z distinguishes four levels
of syntactic constituency: the lexical level, the phrasal level,
the level of topological fields, and the clausal level.
In addition to constituent structure, annotated trees contain edge labels
between nodes which encode grammatical functions.
Words are annotated with inflectional morphology at the lexical level
(currently ca. 80% of the sentences are covered).

The treebank is available in 3 different formats:
- NEGRA export format
- XML format
- Penn Treebank format

Currently, about 23 500 sentences of the treebank (about 1 100 articles) have
been enriched with anaphoric and coreference relations referring to nominal
and pronominal antecedents.

Linking relations include: coreferential (two NPs refer to the same
extralinguistic referent), anaphoric/cataphoric (a definite pronoun refers to
a contextual antecedent) and other relations (split-antecedent, instance) as
well as marking of expletive pronouns.

The referentially annotation is available in a stand-alone version, which is
in the PALinkA format, or with a unified representation of syntactic and
referential information, in the NEGRA Export and XML formats.

What is new in the third release:

- about 5 000 additional sentences
- referential annotation
- cleaner versions of the trees published in the first/second release

The license for TueBa-D/Z is granted free of charge for scientific use.
For more information, please refer to:
http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

With best regards,

Erhard W. Hinrichs
Sandra K├╝bler
Heike Zinsmeister
Karin Naumann
Holger Wunsch
Yannick Versley

Linguistic Field(s): Computational Linguistics; Syntax; Text/Corpus Linguistics

Respond to list|Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.