FYI: GerManC Corpus is Now Available
| Author: |
Richard Whitt
|
| Linguistic Field(s): |
Computational Linguistics
Historical Linguistics Text/Corpus Linguistics |
| FYI Body: |
The complete GerManC Corpus, a representative corpus of Early
Modern German from 1650 to 1800, is now publicly available at the Oxford Text Archive: http://www.ota.ox.ac.uk/desc/2544 Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods. The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet. Project Team: Martin Durrell (PI), Paul Bennett (Co-Investigator), Silke Scheible (RA), Richard J. Whitt (RA), and Astrid Ensslin (RA, Newspaper Corpus). |

