LINGUIST List 2.701

Thu 24 Oct 1991

Misc: Computational: Shoebox, Speech Database, Prolog

Editor for this issue: <>


  1. Evan Antworth, SHOEBOX for MS-DOS by FTP
  2. Ron Cole, A Multi-language Speech Database: An Informal Survey
  3. Stavros Macrakis, 2.693 Spanish MT, Prolog Course

Message 1: SHOEBOX for MS-DOS by FTP

Date: Mon, 21 Oct 91 9:34:35 CDT
From: Evan Antworth <>
Subject: SHOEBOX for MS-DOS by FTP
SHOEBOX for MS-DOS is now available by FTP from SIMTEL20 and various
other archives. See below on availability.
SHOEBOX is a database management program, designed expressly to
meet the needs of the field linguist. Using SHOEBOX, the linguist
can easily enter, edit, and analyze lexical, textual, anthropological
and other types of data. All data are maintained in ASCII text files.
For example, with SHOEBOX, one can:
+ Maintain a simple dictionary, or a more complex lexicon,
+ Interlinearize text, where new words are automatically entered
 into the dictionary,
+ Do grammatical filing and analysis of text data,
+ Enter and file cultural notes,
+ Maintain nonlinguistic types of databases, such as address lists
 or library catalogs.
SHOEBOX contains numberous features that aid in accomplishing the
various tasks that are often a part of linguistic data storage and
analysis. These include:
+ A text editor for the entry and editing of data,
+ The ability to conduct very rapid searches; any data record can be
 accessed nearly instantaneously for editing or review,
+ A rigorous select option that allows the user to view only those
 records that conform to certain criteria,
+ The ability to specify a special sort ordering, taking into account
 groupings of digraphs and characters from the IMB extended
 character set,
+ A flash card function to aid in language learning,
+ Functions to number and interlinearize text.
SHOEBOX allows up to seven databases to be concurrently loaded, each
accessible by the touch of a single key. Because each database can
reference the other databases, information can be easily retrieved
and integrated from any location.
SHOEBOX was written by John Wimbish of the Summer Institute of Linguistics.
Version 12a is now being offered to the academic community as 'freeware'.
SHOEBOX is available by anonymous FTP (and mail server) from SIMTEL20: (
It is also available from various other archives that mirror SIMTEL20
including these: (
 /mirrors/msdos/linguistics/ (
 /pub/PC/simtel-20/linguistics/ (
Evan Antworth | Internet:
Academic Computing Department | UUCP: ...!uunet!convex!txsil!evan
Summer Institute of Linguistics | phone: 214/709-2418
7500 W. Camp Wisdom Road | fax: 214/709-3387
Dallas, TX 75236 |
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: A Multi-language Speech Database: An Informal Survey

Date: Tue, 22 Oct 91 17:40:50 -0700
From: Ron Cole <>
Subject: A Multi-language Speech Database: An Informal Survey
This posting is being made to gauge the level of potential interest in
a speech database that we are collecting and developing. We believe
that the database will be of great value to the speech and language
research communities, but we would like to substantiate this through an
informal survey.
The Center for Spoken Language Understanding at the Oregon Graduate
Institute is collecting a large, 10-language database of digitized speech
recorded over the telephone. The goal of the initial database collection
effort is to obtain fluent speech samples from at least 100 native
speakers in each of 10 languages. The languages in the database are:
English, French, German, Japanese, Korean, Mandarin Chinese, Persian (Farsi),
Spanish, Tamil and Vietnamese.
The recording protocol was designed to obtain (a) short descriptions
from the speakers about themselves and their environment (domain-specific
vocabularies); (b) language names, digits, days of the week (well-defined,
restricted vocabularies); and (c) samples of elicited free speech
(unrestricted vocabularies). Each speaker provides about 5 minutes of
speech. We will verify each utterance in the database and assign
time-aligned phonetic transcriptions to them.
When database development is completed in mid-1992, we intend
to make the database available to interested researchers at nominal
cost. The database package will also include speech tools to display
and interactively modify the speech files, and signal processing
functions that compute different parameters of the speech waveforms.
This software can be used on any UNIX(tm) machine running the X window
system. The National Science Foundation is funding the development of
the speech software tools.
Portions of our database are currently being used for research on
automatic language identification. While we believe that this
database will satisfy a long-standing need for a public-domain database
for automatic language identification research, the presence of
different types of vocabularies in this database makes it eminently
suitable for other research areas in speech and natural language.
If you think this database would be helpful in your research, or if you
are interested in obtaining it, please contact me at:
Ronald A. Cole
Center for Spoken Language Understanding
Oregon Graduate Institute of Science and Technology
19600 NW Von Neumann Drive
Beaverton, OR 97006-1999
Vmail: (503) 690-1159
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: 2.693 Spanish MT, Prolog Course

Date: Wed, 23 Oct 91 11:28:25 EDT
From: Stavros Macrakis <>
Subject: 2.693 Spanish MT, Prolog Course
ALS's ad for their Prolog course translates a Prolog fragment as "Any
linguist who does not know Prolog is unhappy." This implies that
Prolog is a truly declarative language. Although Prolog is an
important, interesting, and useful programming language, alas, it
really isn't that declarative (although it is more declarative than
many other languages). What the Prolog fragment really says is:
In order to establish property unhappy(X), first establish
linguist(X). Then, if you cannot establish know(X,prolog), consider
that unhappy(X) has been established.
It goes without saying that all terms are uninterpreted, but that's I
guess a reasonable amount of advertising puffery. Then again, Drew
McDermott's classic article ``Artificial Intelligence and Natural
Stupidity'' points out how easy it is to be seduced by suggestive
predicate names.
Still, it may well be worthwhile learning Prolog...!
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue