Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login

FYI: 40474 Split Compounds from GermaNet Available

Author: Verena Henrich

Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics

Subject Language(s): German

FYI Body: We are happy to announce the availability of 40474 German nominal compounds from GermaNet release 8.0 that have been split into their constituent parts, i.e., modifier and head. This dataset has been constructed semi-automatically and all compound splits have been manually post-corrected.

The list of split compounds is freely available for download at

For many applications, it is helpful to have information about the parts of the compound, as usually the semantic interpretation is based on the meaning of its parts. What makes compound splitting for German a challenging task is the fact that compounding, which is a very productive word formation process in German, is not always simple string concatenation. It often involves the presence of intervening linking elements or the elision of word-final characters in the modifier constituent of a compound.

For more information about GermaNet, please consult the project website:

Back   FYI main page