LINGUIST List 4.587

Wed 28 Jul 1993

FYI: Software Available: Deriving FSM's from Corpora

Editor for this issue: <>


  1. James Tauber, Software: Deriving FSMs from corpora

Message 1: Software: Deriving FSMs from corpora

Date: Thu, 29 Jul 1993 00:35:40 Software: Deriving FSMs from corpora
From: James Tauber <>
Subject: Software: Deriving FSMs from corpora

The software FiST, which I mentioned in Linguist 4.430 is now available in
C source code by anonymous ftp.

What is FiST?

 FiST stands for Finite State Tools and refers to a set of programs
 for implementing Finite State Machines (a.k.a. Finite State Automata).

What can it do?

 At the moment, it can tokenize a corpus by character, or word (often
 a morphological tag) and then produce a minimal deterministic finite-
 state table.

 Later I will work on adding a recognizer transducer (to produce
 output), weighting (for probabilistic recognition) and grouping
 tools for RTN production.

What can it be used for?

 On the simplest level, it can be used for teaching theoretical computer
 science, for graphotactics (recognizing valid letter sequences) and for
 complex pattern searching. It could also be used for data compression
 on lists with high structural repetition and where ordering within the
 list is unimportant.

 With later developments mentioned above (which will only take a month
 or two once the basics are done) the software could be used for
 two-level morphological analysis (alla Kimmo), probabilistic
 recognition (with applications to source-criticism in ancient texts.)
 and inducing grammars from tagged corpora.

Does it cost anything?

 The software is free but the copyright remains mine. See the GNU
 General Public License included (in the file COPYING) for details.

Where can it be obtained?

 It is presently available by anonymous ftp from
 in the directory /pub/jtauber.

 Finger for the latest release details.

 Mail me at the same address if you cannot ftp.

I must point out that this program is in a very early beta-version. I have a
lot to add and no doubt, a lot to fix. PLEASE, flame me if I've done anything
wrong or just plain silly. If you would like to see anything added, just let
me know. If you actually use this program (or later versions) for anything, I'd
love to hear from you.

James Tauber

snail-mail: 72 Central Road
 Rossmoyne WA 6148
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue