FYI: The GerManC Corpus is Now Available
| Author: |
Richard J. Whitt
|
| Linguistic Field(s): |
Computational Linguistics
Historical Linguistics Text/Corpus Linguistics |
| FYI Body: |
The GerManC Corpus, a multi-genre representative corpus of Early
Modern German from 1650-1800, is now publicly available for download at http://www.ota.ox.ac.uk/desc/2544. Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods. The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet. In addition to plain text, the corpus is also available in TEI Lite P5 XML, GATE XML, and GATE column formats. Project web-site: http://www.llc.manchester.ac.uk/research/projects/germanc/ Project Team: Martin Durrell (PI), Paul Bennett (Co-Investigator), Silke Scheible (RA), Richard J. Whitt (RA), and Astrid Ensslin (RA, newspaper corpus) |
Business Plan,Business Ideas,Advanced Energy,High Technology,Healthy Diets,Healthy Foods,Games Guides,Games Cheats,Travel Guides,Travel Tips,Study Skills,Study Tips,Health Tips,Health Guides,Jewelry Stores,Jewellery UK Online,Digital Camera Reviews,Digital Camera Buying Guide,Replica Handbags,Replica Bags,Jackets on Sale,Jackets Clearance,WoW Gold,Cheap WoW Gold,Buy WoW Gold,WOW Gold,Swtor Credits

