LINGUIST List 23.2276
|
Fri May 11 2012
FYI: GerManC Corpus is Now Available
Editor for this issue: Brent Miller
<brent linguistlist.org>
|
New! Visit LL's Multitree project for over 1000 trees dynamically generated from scholarly hypotheses about language relationships: http://multitree.linguistlist.org/
To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
|
Date: 10-May-2012
From: Richard Whitt <jasonwhitt mindspring.com>
Subject: GerManC Corpus is Now Available
E-mail this message to a friend
The complete GerManC Corpus, a representative corpus of Early Modern German from 1650 to 1800, is now publicly available at the Oxford Text Archive: http://www.ota.ox.ac.uk/desc/2544 Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods. The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet. Project Team: Martin Durrell (PI), Paul Bennett (Co-Investigator), Silke Scheible (RA), Richard J. Whitt (RA), and Astrid Ensslin (RA, Newspaper Corpus).
Linguistic Field(s): Computational Linguistics; Historical Linguistics; Text/Corpus Linguistics
Read more issues|LINGUIST home page|Top of issue
|
|
Page Updated: 11-May-2012
|
|
About LINGUIST
|
Contact Us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.
|
|