* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *


LINGUIST List 23.2276

Fri May 11 2012

FYI: GerManC Corpus is Now Available

Editor for this issue: Brent Miller <brentlinguistlist.org>


New! Multi-tree Visit LL's Multitree project for over 1000 trees dynamically generated from scholarly hypotheses about language relationships:
            http://multitree.linguistlist.org/

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
Date: 10-May-2012
From: Richard Whitt <jasonwhittmindspring.com>
Subject: GerManC Corpus is Now Available
E-mail this message to a friend

The complete GerManC Corpus, a representative corpus of Early
Modern German from 1650 to 1800, is now publicly available at the
Oxford Text Archive:
http://www.ota.ox.ac.uk/desc/2544

Following the model of the ARCHER corpus and given the aim of
representativeness, the GerManC corpus consists of text samples of
about 2000 words from eight genres: drama, newspapers, sermons
and personal letters (to represent orally oriented registers) and
narrative prose (fiction or non-fiction), scholarly (i.e. humanities),
scientific and legal texts (to represent more print-oriented registers). In
order to facilitate tracing historical developments, the whole period was
divided into fifty year sections (in this case 1650-1700, 1700-1750 and
1750-1800), and an equal number of texts from each genre was
selected for each of these sub-periods.

The complete corpus thus consists of 360 samples, comprising
approximately 800,000 words. Appendix 1 in the download package
contains a lists of the files in the corpus with full documentation in an
Excel spreadsheet.

Project Team: Martin Durrell (PI), Paul Bennett (Co-Investigator), Silke
Scheible (RA), Richard J. Whitt (RA), and Astrid Ensslin (RA,
Newspaper Corpus).

Linguistic Field(s): Computational Linguistics; Historical Linguistics; Text/Corpus Linguistics

Read more issues|LINGUIST home page|Top of issue



Page Updated: 11-May-2012

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.