LINGUIST List 11.1567

Mon Jul 17 2000

Sum: German Word Lists

Editor for this issue: Lydia Grebenyova <>


  1. Stefan Thomas Gries, German word lists

Message 1: German word lists

Date: Wed, 12 Jul 2000 14:19:08 +0200
From: Stefan Thomas Gries <>
Subject: German word lists

For Query: Linguist 11.1459

Dear colleagues

Recently I posted a query about where to download German word lists. I would
like to thank the following people (in alphabetical order) for their kind

Anna Braasch
Damon Allen Davison
Pius ten Hacken
Agnes Muehlmeyer-Mentzel
Noemi Preissner
Markus Schulze

In what follows I provide a list of the sites that were pointed out to me
with some additional comments:
These wordlists were taken from seven corpora of the domains electronic data
processing, geography, law, medicine, sports, linguistics, economics and a
representative german corpus (LIMAS-corpus). Each of theses corpora contains
roughly 1.000.000 wordforms. Downloadable are:
o Frequency lists of morphemes, allomorphs, wordforms of the single corpora.
o so-called "n-domain-lists" of morphemes, allomorphs, wordforms:
n-domain-list: list of items that occured in n of the domain-specific
corpora mentioned above) eg.: the 2-domain-list of medicine and law contains
all morphems / allomorphs / wordforms that occured in both corpora
together with their respective frequency information
A useful collection of lists for French, English and German (large word
lists and smaller stop lists)
They offer not only a list of word forms, but also a morphological analysis
module. In addition, word formation rules can be applied to recognise newly
coined compounds and derivations, which is not a trivial advantage in

Finally, Agnes Muehlmeyer was so kind to let me have a 360,000 words word
list (generated on the basis of the German weekly newspaper Die Zeit (1986).

Apart from the above-mentioned sites directly concerned with word lists, I
was also directed to some sites with slightly different though related

Once again, thanks to all contributors.

S t e f a n T h . G r i e s
- --------------------------------------------------------------------------
B u e r o / O f f i c e :
Syddansk Universitet
Institut for Erhvervssproglig Informatik og Kommunikation
Grundtvigs All´┐Ż 150
6400 Sonderborg
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue