LINGUIST List 5.463

Wed 20 Apr 1994

FYI: Morphological Analyzer for German

Editor for this issue: <>


Directory

  1. Markku Norberg, Announcement

Message 1: Announcement

Date: Tue, 12 Apr 94 17:34:03 +0Announcement
From: Markku Norberg <marnorbeling.helsinki.fi>
Subject: Announcement

 * * * THE WINNER! * GERTWOL(TM) * * *
Awarded Best Morphological Analyzer for German, at the 1. Morpholympics.

GERTWOL, Lingsofts' German morphological analyzer, was declared the
overall winner at the first ever Morpholympics at the University of
Erlangen-N"urnberg, Germany in early March. The 1. Morpholympics,
where different systems for German automatic word form recognition
were publicly tested, was organised by the GLDV. The main areas where
GERTWOL was chosen to be a winner was:
 * an extensive lexicon providing excellent text coverage,
 * a unique compounding mechanism to handle compound words correctly
 and the ability to recognize new compound words,
 * an excellent theoretical foundation based on Prof. Kimmo Koskenniemi's
 Two-Level Model for morphology.

Further information on GERTWOL can be obtained from Eugene Young
(eyoungling.Helsinki.fi) or Markku Norberg (marnorbeling.helsinki.fi).

 1. Morpholympics
 ****************

The goal of the 1. Morpholympics was an objective, theory independent
comparison of existing systems for automatic word form
recognition. The following aspects were evaluated: linguistic
motivation, technical design, data coverage and speed. For practical
purposes, one natural language, German, was chosen as the main test
language of the 1. Morpholympics.

Amongst the eight teams participating in the 1. Morpholympics, Lingsoft
was the only non-German team. The jury consisted of five independent
judges, all professors of German from various German universities and
after two days of extensive presentations and tests, Lingsofts'
GERTWOL was ultimately named the WINNER.

 Product Information
 *******************

GERTWOL is a computer program for morphologically analyzing German
text. The theoretical foundation of the GERTWOL-program is the
Two-Level Model developed by Professor Kimmo Koskenniemi at the
Research Unit for Computational Linguistics at the University of
Helsinki.

The present lexicon contains almost 100,000 base forms for which it
recognizes and analyzes the inflected forms. GERTWOL also has an
extensive ability to recognize new German compounds and a preliminary
facility for derivational morphology. In addition, the conversion of
infinitives and participles to nouns is handled.

The GERTWOL lexicon is based on Collins German-English Dictionary, 2nd
edition, Copyright 1991 HarperCollins Publishers. Substantial
revisions and additions have been made to the original lexicon, as the
lexicon has been tested on text corpora consisting of newsqpaper text,
legal documents, weather reports, literary material, and business
reports from Swiss banks. According to our present estimates GERTWOL
is able to analyze more than 99% of correctly spelled (98% of
unrestricted) German text and have been evaluated on over 30 million
words of unrestricted news paper text.

We have the following versions of the lexicon, which differ with
respect to the following characteristics:
 - 8 bit ISO-Latin - Swiss ISO-Latin 1 - 7 bit ASCII

The theoretical foundation used to produce GERTWOL has been
successfully implemented for a wide number of languages and has been
widely recognised as the only method applicable to any language with
reasonable speeds of upto 1000 words per second with large
dictionaries on mainframes and UNIX hosts.

If you have any questions or would like more information, please
contact: Lingsoft, Inc., Museokatu 18 A 3, FIN-00100 Helsinki,
FINLAND. Tel: 358 0 499 556 / Fax: 358 0 440 602.
Trademark names used throughout this document are trademarks of their
respective owners. Copyright Lingsoft, Inc.[4.94]

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue