Editor for this issue: <>
SHOEBOX for MS-DOS is now available by FTP from SIMTEL20 and various other archives. See below on availability. SHOEBOX is a database management program, designed expressly to meet the needs of the field linguist. Using SHOEBOX, the linguist can easily enter, edit, and analyze lexical, textual, anthropological and other types of data. All data are maintained in ASCII text files. For example, with SHOEBOX, one can: + Maintain a simple dictionary, or a more complex lexicon, + Interlinearize text, where new words are automatically entered into the dictionary, + Do grammatical filing and analysis of text data, + Enter and file cultural notes, + Maintain nonlinguistic types of databases, such as address lists or library catalogs. SHOEBOX contains numberous features that aid in accomplishing the various tasks that are often a part of linguistic data storage and analysis. These include: + A text editor for the entry and editing of data, + The ability to conduct very rapid searches; any data record can be accessed nearly instantaneously for editing or review, + A rigorous select option that allows the user to view only those records that conform to certain criteria, + The ability to specify a special sort ordering, taking into account groupings of digraphs and characters from the IMB extended character set, + A flash card function to aid in language learning, + Functions to number and interlinearize text. SHOEBOX allows up to seven databases to be concurrently loaded, each accessible by the touch of a single key. Because each database can reference the other databases, information can be easily retrieved and integrated from any location. SHOEBOX was written by John Wimbish of the Summer Institute of Linguistics. Version 12a is now being offered to the academic community as 'freeware'. SHOEBOX is available by anonymous FTP (and mail server) from SIMTEL20: wsmr-simtel20.army.mil (192.88.110.20) pd1:<msdos.linguistics>sh12a.zip It is also available from various other archives that mirror SIMTEL20 including these: wuarchive.wustl.edu (128.252.135.4) /mirrors/msdos/linguistics/sh12a.zip rana.cc.deakin.oz.au (128.184.1.4) /pub/PC/simtel-20/linguistics/sh12a.zip nic.funet.fi (128.214.6.100) /pub/msdos/science/linguistics/sh12a.lzh Evan Antworth | Internet: evanMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuesil.org Academic Computing Department | UUCP: ...!uunet!convex!txsil!evan Summer Institute of Linguistics | phone: 214/709-2418 7500 W. Camp Wisdom Road | fax: 214/709-3387 Dallas, TX 75236 |
This posting is being made to gauge the level of potential interest in a speech database that we are collecting and developing. We believe that the database will be of great value to the speech and language research communities, but we would like to substantiate this through an informal survey. The Center for Spoken Language Understanding at the Oregon Graduate Institute is collecting a large, 10-language database of digitized speech recorded over the telephone. The goal of the initial database collection effort is to obtain fluent speech samples from at least 100 native speakers in each of 10 languages. The languages in the database are: English, French, German, Japanese, Korean, Mandarin Chinese, Persian (Farsi), Spanish, Tamil and Vietnamese. The recording protocol was designed to obtain (a) short descriptions from the speakers about themselves and their environment (domain-specific vocabularies); (b) language names, digits, days of the week (well-defined, restricted vocabularies); and (c) samples of elicited free speech (unrestricted vocabularies). Each speaker provides about 5 minutes of speech. We will verify each utterance in the database and assign time-aligned phonetic transcriptions to them. When database development is completed in mid-1992, we intend to make the database available to interested researchers at nominal cost. The database package will also include speech tools to display and interactively modify the speech files, and signal processing functions that compute different parameters of the speech waveforms. This software can be used on any UNIX(tm) machine running the X window system. The National Science Foundation is funding the development of the speech software tools. Portions of our database are currently being used for research on automatic language identification. While we believe that this database will satisfy a long-standing need for a public-domain database for automatic language identification research, the presence of different types of vocabularies in this database makes it eminently suitable for other research areas in speech and natural language. If you think this database would be helpful in your research, or if you are interested in obtaining it, please contact me at: Ronald A. Cole Director Center for Spoken Language Understanding Oregon Graduate Institute of Science and Technology 19600 NW Von Neumann Drive Beaverton, OR 97006-1999 USA Vmail: (503) 690-1159 Internet: coleMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecse.ogi.edu
ALS's ad for their Prolog course translates a Prolog fragment as "Any linguist who does not know Prolog is unhappy." This implies that Prolog is a truly declarative language. Although Prolog is an important, interesting, and useful programming language, alas, it really isn't that declarative (although it is more declarative than many other languages). What the Prolog fragment really says is: In order to establish property unhappy(X), first establish linguist(X). Then, if you cannot establish know(X,prolog), consider that unhappy(X) has been established. It goes without saying that all terms are uninterpreted, but that's I guess a reasonable amount of advertising puffery. Then again, Drew McDermott's classic article ``Artificial Intelligence and Natural Stupidity'' points out how easy it is to be seduced by suggestive predicate names. Still, it may well be worthwhile learning Prolog...! -sMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue