LINGUIST List 25.4307
Wed Oct 29 2014
Media: Treebank of Early New High German
Editor for this issue: Malgorzata Cavar <gosialinguistlist.org>
Ulrike Demske <udemske
Treebank of Early New High German E-mail this message to a friend
We recently released the Mercurius Treebank of Early New High German (1350-1650) to the public. It is being hosted by the INESS treebanking infrastructure (http://iness.uib.no
) and is subject to the CC-BY license.
The Mercurius Treebank is a syntactically annotated corpus of the early newspapers 'Mercurius' and 'Annus Christi', published in 1667 and 1597, respectively. It comprises a total of 170.000 tokens and 8.400 syntactically annotated sentences. It was annotated using a hybrid annotation scheme based on TIGER, which combines both dependency and constituency information. Each text segment was independently annotated by two annotators using the
nnotate tool (Brants and Plaehn, 2000).
The Mercurius Treebank can be found on the INESS website (http://iness.uib.no
) by navigating to 'Treebank selection' -> German -> deu-mercurius-con, and then selecting 'Sentence Overview' after accepting the license agreement. It may be necessary to create an OpenIdP account first.
The treebank is searchable with an extended TIGERSearch syntax (Meurer, 2012). For example, the search query [word=''Kirche''] will return all sentences which contain the word ''Kirche'', while the search query [cat=''PP''] will return all sentences which contain a prepositional phrase node.
The present corpus has been compiled in 2003 to 2005 at Saarland University as a pilot to a much larger project aiming at the establishment of a syntactically annotated reference corpus for the period of Early New High German (1350-1650) as a whole
Linguistic Field(s): Historical Linguistics
Subject Language(s): German (deu)
Page Updated: 29-Oct-2014