Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34168

Still Needed:

$40832

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login

FYI: GerManC Corpus is Now Available


Author: Richard Whitt

Linguistic Field(s): Computational Linguistics
Historical Linguistics
Text/Corpus Linguistics

FYI Body: The complete GerManC Corpus, a representative corpus of Early
Modern German from 1650 to 1800, is now publicly available at the
Oxford Text Archive:
http://www.ota.ox.ac.uk/desc/2544

Following the model of the ARCHER corpus and given the aim of
representativeness, the GerManC corpus consists of text samples of
about 2000 words from eight genres: drama, newspapers, sermons
and personal letters (to represent orally oriented registers) and
narrative prose (fiction or non-fiction), scholarly (i.e. humanities),
scientific and legal texts (to represent more print-oriented registers). In
order to facilitate tracing historical developments, the whole period was
divided into fifty year sections (in this case 1650-1700, 1700-1750 and
1750-1800), and an equal number of texts from each genre was
selected for each of these sub-periods.

The complete corpus thus consists of 360 samples, comprising
approximately 800,000 words. Appendix 1 in the download package
contains a lists of the files in the corpus with full documentation in an
Excel spreadsheet.

Project Team: Martin Durrell (PI), Paul Bennett (Co-Investigator), Silke
Scheible (RA), Richard J. Whitt (RA), and Astrid Ensslin (RA,
Newspaper Corpus).

Back   FYI main page