LINGUIST List 2.248

Monday, 27 May 1991

FYI: PC-Kimmo Morphological Processor

Editor for this issue: <>


Directory

  1. Evan Antworth, PC-KIMMO News

Message 1: PC-KIMMO News

Date: Mon, 20 May 91 8:29:51 CDT
From: Evan Antworth <evantxsilutafll.uta.edu>
Subject: PC-KIMMO News

 PC-KIMMO News
 =============

 May 20, 1991

This announcement describes recent developments related to PC-KIMMO (an 
implementation for personal computers of Kimmo Koskenniemi's two-level model 
of word production and recognition).

(1) PC-KIMMO version 1.0.5 update

(2) KGEN - a rule compiler (table generator) for PC-KIMMO

(3) KTEXT - a text-processing application using the PC-KIMMO parser

(4) recent articles related to PC-KIMMO

The software described below is made freely available to the academic 
community for non-commercial use and redistribution. We invite your feedback 
on these programs. Please note that the software is packaged in compressed 
archives: Zip files for MS-DOS and Stuffit files for Macintosh. In addition, 
if you obtain the files by e-mail, they will arrive in encoded form: 
uu-encoding for MS-DOS and Binhex format for Macintosh. Utility programs for 
handling archives and encoded files are available from computer bulletin 
boards or from your university computing center. (Hint for MS-DOS users: when 
you unzip a file, use the -d option to preserve the subdirectories.) Finally, 
it is possible that the files may not yet be available in some of the places 
listed below. Just wait a few days and try again.


(1) PC-KIMMO 1.0.5 update

PC-KIMMO version 1.0.5 has been available since the end of February. It fixes 
a problem with loading very large lexicons (more than 100 sublexicons). Thanks 
to Elizabeth Hinkelman and her colleagues for finding this bug. This version 
also fixes a couple things that caused crashes on the Macintosh. There are no 
functional changes in version 1.0.5. If you want to upgrade to version 1.0.5, 
you can obtain it as follows:

 1. Obtain it via anonymous FTP from the following sources. (I am advised 
that it is best to use the symbolic names rather than the numeric addresses. 
Also, the directory structure is subject to change.)

 MS-DOS version:
 msdos.archive.umich.edu [141.211.165.34]
 msdos/linguistics/pckim105.zip

 Macintosh version:
 mac.archive.umich.edu [141.211.165.34]
 mac/etc/linguistics/pckim105.sit

 2. Request it from us via e-mail. Be *sure* to specify which version you want 
(DOS, Mac, UNIX).

 3. Send a diskette and a self-addressed, stamped diskette mailer to the 
address below. Be *sure* to specify which version you want (DOS, Mac, UNIX) 
and the disk format.


(2) KGEN

KGEN, a rule compiler for PC-KIMMO, is now available for beta testing. KGEN 
was written by Nathan Miles of Ohio State University. All rights and 
responsibilities pertaining to the program presently belong to Nathan Miles 
(not to the Summer Institute of Linguistics). He can be reached by e-mail at 
milescis.ohio-state.edu. Nathan has done a great job at developing this 
program and he deserves our thanks.

KGEN takes a two-level rule like this:

 y:i => :C___+:0

and translates it into a finite state table like this:

  y + 
 C i 0 
 1: 2 0 1 1
 2: 2 3 2 1
 3. 0 0 1 0

KGEN accepts as input a file of two-level rules and produces as output a file 
of state tables that is identical in format to PC-KIMMO's rules file. Anything 
that KGEN does not correctly handle can be easily fixed by hand in its output 
file. Everyone who uses PC-KIMMO (or who doesn't use it because they don't 
want to write tables by hand) is welcome to try out KGEN. But what we really 
need are some beta testers who can compare KGEN's output to tables they have 
written by hand. Let us know if you are willing to beta test KGEN for us.

Presently KGEN runs only under MS-DOS and UNIX, but we hope to get it compiled 
for the Macintosh soon (any Think C experts out there?). You can obtain KGEN 
as follows. 

 1. The MS-DOS version of KGEN is available via anonymous FTP from SIMTEL20:

 wsmr-simtel20.army.mil [192.88.110.20]
 pd1:<msdos.linguistics>kgen02.zip

SIMTEL20 can also be accessed using LISTSERV commands from BITNET via 
LISTSERVNDSUVM1, LISTSERVRPIECS and in Europe from EARN TRICKLE servers 
(for example, FRMOP11 in France). You can also obtain files from SIMTEL20 by 
e-mail. Send this line as the only message to listservvm1.nodak.edu (1 = one) 
(this may not work outside the U.S.):

 /PDGET MAIL PD1:<MSDOS.LINGUISTICS>KGEN02.ZIP UUENCODE

The MS-DOS version of KGEN is also available by anonymous FTP from:

 msdos.archive.umich.edu [141.211.165.34] (symbolic name recommended)
 msdos/linguistics/kgen02.zip

 2. The UNIX version (consisting of the source files which you must compile 
on your own machine) is available by anonymous FTP from the machine TUT:

 cis.ohio-state.edu [128.146.8.60]
 pub/kgen/kgen03.tar.Z

 3. Request KGEN from us via e-mail. Be *sure* to specify which version you 
want (DOS, UNIX).

 4. If all else fails, send a diskette and a self-addressed, stamped diskette 
mailer to the address below. Be *sure* to specify which version you want (DOS,
UNIX) and the disk format.


(3) KTEXT

KTEXT is a new text-processing application that uses the PC-KIMMO parser. It 
accepts as input a text in orthographic form, tokenizes it into words, strips 
off and saves punctuation, capitalization, white space, and formatting codes, 
parses each word, and outputs the result to a quasi-database file with a 
record for each word. Its output data structures are suitable for further 
processing by other programs, such as a text interlinearizer, a syntactic 
parser, or a machine translation system. 

KTEXT is a beta test release that is distributed and supported by the Summer 
Institute of Linguistics. It is available for MS-DOS, Macintosh, and UNIX. You 
can obtain it as follows.

 1. The MS-DOS version of KTEXT is available from SIMTEL20 as (see above on 
how to access SIMTEL20 by FTP or e-mail):

 pd1:<msdos.linguistics>ktext093.zip

It is also available via anonymous FTP from:

 msdos.archive.umich.edu [141.211.165.34] (symbolic name recommended)
 msdos/linguistics/kgen02.zip

 2. The Macintosh version of KTEXT is available via anonymous FTP from:

 mac.archive.umich.edu [141.211.165.34] (symbolic name recommended)
 mac/etc/linguistics/ktext094.sit

It is also available via anonymous FTP from:

 sumex-aim.stanford.edu [36.44.0.6]
 /info-mac/app/ktext094.hqx

You can also obtain files from SUMEX-AIM by e-mail. Send this line as the only 
message to listservricevm1.rice.edu (1 = one) (this may not work outside the 
U.S.):

 $MACARCH GET /info-mac/app/ktext094.hqx

 3. Request KTEXT from us via e-mail. Be *sure* to specify which version you 
want (DOS, UNIX).

 4. If all else fails, send a diskette and a self-addressed, stamped diskette 
mailer to the address below. Be *sure* to specify which version you want (DOS,
UNIX) and the disk format.

 5. To obtain the UNIX sources, please contact us at the address below.


(4) Recent articles related to PC-KIMMO:

Antworth, Evan L. 1991. Introduction to two-level phonology. Notes on 
 Linguistics, 53:4P18. Dallas, TX: Summer Institute of Linguistics.

Antworth, Evan L. 1991. Glossing text with the PC-KIMMO morphological parser. 
 (Manuscript submitted for publication)

Simons, Gary F. 1991. A two-level processor for morphological analysis. Notes 
 on Linguistics, 53:19P27. Dallas, TX: Summer Institute of Linguistics.

Vanni, Michelle. 1990. Abstract of "PC-KIMMO: a two-level processor for 
 morphological analysis." Georgetown Journal of Languages & Linguistics 
 1.4:498-500.


Special requests for any of the software or articles described above and/or 
requests for more information should be sent to:

Evan Antworth
Academic Computing Department
Summer Institute of Linguistics
7500 W. Camp Wisdom Road
Dallas, TX 75236
U.S.A.

Internet: evantxsil.sil.org <-------- new address as of May 1991
UUCP: ...!uunet!convex!txsil!evan
phone: 214/709-2418
fax: 214/709-3387

From utafll.uta.edu:txsil!evanutafll.uta.edu Mon May 20 22:56:49 1991
Received: from ns.uta.edu by uniwa.uwa.oz.au with SMTP (5.61+IDA+MU)
 id AA00365; Mon, 20 May 1991 22:56:33 +0800
Received-Date: Mon, 20 May 1991 22:56:33 +0800
Received: from utafll.uta.edu by ns.uta.edu with SMTP; 
 Mon, 20 May 1991 9:56:24 CDT
Received: from txsil.UUCP by utafll.uta.edu with UUCP (4.1/25-eef)
 id AA28944; Mon, 20 May 91 10:57:07 CDT
From: txsil!evantxsilutafll.uta.edu (Evan Antworth)
X-Mailer: SCO System V Mail (version 3.2)
To: linguist
Subject: new linguistics directory on SIMTEL20
Date: Mon, 20 May 91 8:55:15 CDT
Message-Id: <9105200855.aa18036txsil.sil.org>
Status: RO

There is a new directory on SIMTEL20 called PD1:<MSDOS.LINGUISTICS>. Two
programs that previously were in the education subdirectory have now
been moved to this new linguistics subdirectory; these are fonol400.zip
and pckimmo.zip. The directory also contains a couple new programs
related to PC-KIMMO. I hope that others will submit programs useful to 
linguists to this new directory.

(File can be downloaded from SIMTEL20 by anonymous FTP from
wsmr-simtel20.army.mil [192.88.110.20]).


Evan Antworth
evantxsil.sil.org <------- new address as of May 1991

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue