LINGUIST List 25.4307

Wed Oct 29 2014

Media: Treebank of Early New High German

Editor for this issue: Malgorzata Cavar <gosialinguistlist.org>


Date: 26-Sep-2014
From: Ulrike Demske <udemskeuni-potsdam.de>
Subject: Treebank of Early New High German
E-mail this message to a friend

We recently released the Mercurius Treebank of Early New High German (1350-1650) to the public. It is being hosted by the INESS treebanking infrastructure (http://iness.uib.no) and is subject to the CC-BY license.

The Mercurius Treebank is a syntactically annotated corpus of the early newspapers 'Mercurius' and 'Annus Christi', published in 1667 and 1597, respectively. It comprises a total of 170.000 tokens and 8.400 syntactically annotated sentences. It was annotated using a hybrid annotation scheme based on TIGER, which combines both dependency and constituency information. Each text segment was independently annotated by two annotators using the nnotate tool (Brants and Plaehn, 2000).

The Mercurius Treebank can be found on the INESS website (http://iness.uib.no) by navigating to 'Treebank selection' -> German -> deu-mercurius-con, and then selecting 'Sentence Overview' after accepting the license agreement. It may be necessary to create an OpenIdP account first.

The treebank is searchable with an extended TIGERSearch syntax (Meurer, 2012). For example, the search query [word=''Kirche''] will return all sentences which contain the word ''Kirche'', while the search query [cat=''PP''] will return all sentences which contain a prepositional phrase node.

The present corpus has been compiled in 2003 to 2005 at Saarland University as a pilot to a much larger project aiming at the establishment of a syntactically annotated reference corpus for the period of Early New High German (1350-1650) as a whole
(http://www.uni-potsdam.de/guvdds/projekte/aktproj.html).

Linguistic Field(s): Historical Linguistics

Subject Language(s): German (deu)

Page Updated: 29-Oct-2014