LINGUIST List 4.862

Tue 19 Oct 1993

Sum: Parsing Finnish

Editor for this issue: <>


Directory

  1. Sara Elo, Parsing Finnish

Message 1: Parsing Finnish

Date: Mon, 18 Oct 93 22:01:56 -0Parsing Finnish
From: Sara Elo <elomedia.mit.edu>
Subject: Parsing Finnish


Hello,
A few weeks ago I posted a question on
1) references and published papers on parsing Finnish
2) Finnish dictionary or encyclopedia on CD-ROM (or other support)

Quite unanimously was pointed out by different sources Kimmo Koskenniemi's
two-level morphological model and the morphological analyzer based on the
model. Relevant papers from the Dept of General Linguistics at the
University of Helsinki follow:
===============================================================================
BOOK{koskenniemi83,
 AUTHOR = "Koskenniemi, Kimmo",
 YEAR = "1983",
 TITLE = "Two-level morphology: a general
computational model for word-form recognition and production",
 SERIES = "Publication",
 NUMBER = "11",
 ADDRESS = "Helsinki",
 PUBLISHER = "University of Helsinki Department of General
Linguistics" }

INPROCEEDINGS{koskenniemi83b,
 AUTHOR = "Koskenniemi, Kimmo",
 YEAR = "1983",
 TITLE = "Two-level morphology for morphological analysis",
 BOOKTITLE = "IJCAI-83",
 PUBLISHER = "International Joint Conference on Artificial
Intelligence",
 PAGES = "683--685" }

INPROCEEDINGS{koskenniemi84,
 AUTHOR = "Koskenniemi, Kimmo",
 YEAR = "1984",
 TITLE = "A general computational model for word-form recognition
and production",
 BOOKTITLE = "Proceedings of Coling '84",
 PUBLISHER = "Association for Computational Linguistics",
 PAGES = "178--181" }

MISC{koskenniemi85,
 AUTHOR = "Koskenniemi, Kimmo",
 YEAR = "1985",
 TITLE = "A general two-level computational model for word-form
recognition and production",
 HOWPUBLISHED = "In Karlsson~1985, 1--18" }

MISC{koskenniemi85b,
 AUTHOR = "Koskenniemi, Kimmo",
 YEAR = "1985",
 TITLE = "An application of the two-level model to {F}innish",
 HOWPUBLISHED = "In Karlsson~1985, 19--42" }

BOOK{karlsson85,
 EDITOR = "Karlsson, Fred",
 YEAR = "1985",
 TITLE = "Computational morphosyntax: a report on research
1981--1984",
 SERIES = "Publication",
 NUMBER = "13",
 ADDRESS = "Helsinki",
 PUBLISHER = "University of Helsinki Department of General
Linguistics" }

ARTICLE{karlsson-koskenniemi85,
 AUTHOR = "Karlsson, Fred and Kimmo Koskenniemi",
 YEAR = "1985",
 TITLE = "A process model of morphology and lexicon",
 JOURNAL = "Folia Linguistica",
 VOLUME = "14",
 NUMBER = "",
 PAGES = "207--231" }

ARTICLE{karttunen83,
 AUTHOR = "Karttunen, Lauri",
 YEAR = "1983",
 TITLE = "{\ac KIMMO}: a general morphological processor",
 JOURNAL = "Texas Linguistic Forum",
 VOLUME = "22",
 NUMBER = "",
 PAGES = "163--186" }
==============================================================================

Lingsoft Inc. in Helsinki, Finland makes commercially available, among other
products, FINTWOL, the morphological analyzer of Finnish.
Summary of their products, based on the Koskenniemi model, is given below:
=============================================================================
 0. spell-checking and hyphenation
 1. morphological analysis and generation
 2. stemming for information retrieval
 3. part-of-speech tagging
 ( >99% correct, <5% ambiguity)
 4. NP extraction for text indexing and retrieval
 ( >98% recall, >95% precision)
 5. surface syntactic analysis
 6. grammar checker

English 1,2,3,4,5
German 1 (end of May), 0,2 (end of summer 93)
Swedish 0,1,2,3
Russian 0,1,2
Finnish 0,1,2,3,6
Danish 1, 0,2,3 (end of year 93)
Swahili 1,2

All the lexicons have between 40.000 and 80.000 roots. The programs
are programmed in C and have been ported to various platforms. The
speed of all the tools are btw 600-1000 w/s on a Sparcstation 2.

In a near future we will have tools for French, Estonian, Italian and
Norwegian as well.

contact eyoungling.Helsinki.FI
==============================================================================

PC-Kimmo is a microcomputer version of the KIMMO morphological analyzer
available via ftp. To contact the developers:

 Academic Computing Department
 PC-KIMMO project
 7500 W. Camp Wisdom Road
 Dallas, TX 75236
 U.S.A.

 phone: 214/709-3346, -2418
 fax: 214/709-24333
 email: Evan.Antworthsil.org

REFERENCES (other than the ones mentioned above)

Antworth, Evan L. 1990. PC-KIMMO: a two-level processor for
 morphological analysis. Occasional Publications in Academic
 Computing No. 16. Dallas, TX: Summer Institute of Linguistics.
 ISBN 0-88312-639-7, 273 pages, paperbound.

____. 1993. Glossing text with the PC-KIMMO morphological parser.
 Computers and the Humanities 26:475-484.

Miles, Nathan L. 1991. Automatic generation of two-level FSM
 tables. M.A. thesis, Ohio State University. [Description of the
 KGEN rule compiler.]

Sproat, Richard. 1991. Review of "PC-KIMMO: a two-level
 processor for morphological analysis" by Evan L. Antworth.
 Computational Linguistics 17.2:229-231.
===============================================================================

As you can tell, the morphological analysis is well covered in literature.
The only answer to my question about existing encyclopedia/dictionary was
the above-mentioned lexicon marketed by Lingsoft, Inc.

Thank you for all who responded,
Sara Elo.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue