Editor for this issue: <>
Included is a summary of all the responses I received to my query about speech analysis software for IBM PCs. Many thanks to everyone who replied. Your recommendations are most valuable. Ping Lin ----- From: registryMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuedfki.uni-sb.de (The Software Registry) NATURAL LANGUAGE SOFTWARE REGISTRY The Natural Language Software Registry is a catalogue of software implementing core natural language processing techniques, whether available on a commercial or noncommercial basis. The current version includes + speech signal processors, such as the Computerized Speech Lab (Kay Electronics) + morphological analyzers, such as PC-KIMMO (Summer Institute for Linguistics) + parsers, such as Alveytools (University of Edinburgh) + knowledge representation systems, such as Rhet (University of Rochester) + multicomponent systems, such as ELU (ISSCO), PENMAN (ISI), Pundit (UNISYS), SNePS (SUNY Buffalo), + applications programs (misc.) This document is available on-line via anonymous ftp to ftp.dfki.uni-sb.de (directory:registry), by email to registry
dfki.uni-sb.de, and by physical mail to the address below. If you have developed a piece of software for natural language processing that other researchers might find useful, you can include it by returning the description form below. If you are interested in the preliminary draft of the Registry, do not hesitate to drop us an email message and we will be happy to send it to you. ----- From: GA3662
SIUCVMB.SIU.EDU I have been using for about a year a package called CSRE (Canadian Speech Research Environment), developed by Don Jamieson and others at Western Ontario. It has the advantage of being cheap (around $400 US) and quite easy to use (as compared with, say, ILS). There is a new version out which I haven't got yet (there are a few bugs in it they are still working out specific to the hardware I have). I run it on a 386 with math coprocessor. Pure waveform editing is instantaneous, but it takes a while (a minute or so) to calculate a spectrogram or a pitch track. I think the new version will be even faster. I use a Data Translation DT2801A A/D board, but they are discouraging this (it's old technology) and recommending an Ariel board instead. These are a little expensive (by Southern Illinois University standards--I think around 2-3 thousand dollars. I have been very happy with the program, once I got some input/output problems solved. Ray Kent wrote a paper assessing all the packages on the market. It appeared in the Journal of Speech and Hearing Research in 1990. I have a page proof, so I don't have the exact reference. I believe he was planning to update it, so you might write him. He is at the department of Communicative Disorders, University of Wisconsin at Madison, Madison, WI. 53706. ----- From: lepetit
ux1.cso.uiuc.edu Call professor Philippe MARTIN Director of the experimental Phonetics Laboratory new college on Huron street in the University of Toronto. He has some very powerful software running on IBM ----- From: cshih
gandalf.rutgers.edu (Chilin Shih) I haven't seen the new version of ILS. The old version I saw are very difficult to use. For Example, it requires you to specify beginning frame and ending frame for any display. I find that too time-consuming. Kay is convinient if you want real time display. For general purpose phonetic experiment, I find (again, many be old version by now) its customized packaging annoying. What that means is that the setup is very convinient if you want to do just what they want you to do, but it's next to impossible if what you want is not one of the customized package. THere new, more powerful, therefore more expensive models may be more versatile. I've never used Hypersignal. In my opinion, both ILS and Kay are too expensive for what they deliver. (For Kay, you can easily spend 20K upwards.) For that price, the $1500 Waves+ (that's an old price. Could be doubled by now for new/better version) marketed by Entropic (in Washington D.C) is definitely the best value. But Waves+ runs on UNIX. The speech analysis programs are super. Hardly any pitch tracking error for a male speaker, for example. I have feeded it with recordings with unbearably loud background noise and the pitch track still comes out right. There are numerous powerful functions for displaying, editing, and labeling tasks. For example, I am doing an experiment now, and I customize the display in such a way that at one mouse click, I'll get the wavesform, the pitch tracks, and a labeling window displayed, and the speech played. I'll drop markers with a click at the location I am interested, and the time will be logged automatically. After finish one sentence, a click and I get the whole thing for the next sentence on my list. If I want pitch value, in the version I have I have to write it down in the label window. In the new version I believe you'll get it automatically too. If you don't already have a Kay, I would definitely recommand a UNIX workstation (could be less than $6000 these days, a bargain compared to Kay) with Waves. On the low end, there is something called the CECIL Box that costs a few hundred (my impression is that it's 2 or 3 hundred) dollars that offers lots of the speech analysis programs, formant analysis, pitch analysis, ... I have heard people complaining about various things after they buy a system. Some system only allow you to record 3 or 4 seconds at a time. Signilize for Mac, for example (a 5000 system). This is not important if your experiments are small and sentences are short. One person was devastated because he intended to look into conversational speech. If you are buying a system costing several thousand dollars, the best bet will be to go through every procedure of the experiment you want to run at the demo, the dealers place, or ask about each steps on the phone. If the person can't answer the question, insist that you talk to someone who knows. ----- From: Charles Read <CREAD
macc.wisc.edu> You may wish to read our review: Read, Buder, & Kent. Speech analysis systems: an evaluation. Journal of Speech and Hearing Research, 1992, 35, 314-332. Of course, even this review is somewhat out of date. On PCs, I personally recommend CSpeech and CSRE. ----- From: Jyrki Tuomainen <jyrtuoma
hcc.utu.fi> re: your query on speech analysis software, I'd advise you to consult an article in JSHR by Read, Buder & Kent, which is a thorough (and fair, IMO) review of the most common systems in the market. The complete ref is: Read, C., Buder, E. & Kent, R. (1992) Speech Analysis Systems: An Evaluation. Journal of Speech and Hearing Research, 35, 314-332. What comes to my experience and recommendations, the only thing that is clear to me is that if you don't have technical staff available and/or you don't know how to deal with the technical details, I would suggest that you settled for the complete packages like CSL (from Kay) or Speech Station (Sensimetrics). The drawback with them is that they are somewhat more expensive than e.g. CSRE, but you're sure to get a package that does what it's supposed to. ----- From: AL0017P
prime1.huddersfield.ac.uk I spent some months working on a research project connected with automatic speech recognition. We used the Loughborough Sound Images Speech Workstation, which runs on a PC AT (or 286/386 based compatible), with 640k RAM, EGA/VGA graphics, Microsoft Mouse (or compatible), Hard disk (40 MB recommended), RAM disk (required for stereo recording or fast sample rates), DOS version 3.0 or greater. The LSI Speech Workstation can display the signal in a variety of ways, including black and white or full-colour spectrograms, waveforms, spectral slices (cross-section through a spectrogram which is displayed horizontally across the screen)... All of them are reasonably fast, especially on a 386 PC. A wide range of bandwidths is available for the spectrogram and the spectral slice, and the waveform can be scaled.. Several of these can be displayed at the same time by splitting the screen. The screen can also be split to accommodate parts of two separate recordings. The analog card supplied with the Speech Workstation has two input channels each of which can be connected to either a microphone or line output. Two markers are available, which allow you to perform a number of operations on the signal, including cutting and pasting, copying etc. It is possible to play only marked sections of the signal on the screen. I can't remember how long individual recordings could be, but I think it was something like 3 or 5 minutes maximum. Anyway, I quite enjoyed working with the Speech Workstation. I don't know whether Loughborough Sound Images have agents in the USA, but the best way to find out is probably to contact them direct (they'll also send you further details): Loughborough Sound Images Limited The Technology Centre Epinal Way Loughborough ENGLAND LE11 0QE Telephone: (0509) 231 843 Telex: 34 1409 LUFBRA G Fax: (0509) 262 433