Thu 21 Jan 1993

Sum: Speech analysis software for IBM PCs

Date: Tue, 19 Jan 1993 21:36:03 Sum: Speech analysis software for IBM PCs
From: Ping Lin
Subject: Speech analysis software for IBM PCs

Included is a summary of all the responses I received to my query
about speech analysis software for IBM PCs. Many thanks to everyone
who replied. Your recommendations are most valuable.

Ping Lin
I have been using for about a year a package called CSRE (Canadian
Speech Research Environment), developed by Don Jamieson and others
at Western Ontario. It has the advantage of being cheap (around
$400 US) and quite easy to use (as compared with, say, ILS). There
is a new version out which I haven't got yet (there are a few bugs
in it they are still working out specific to the hardware I have).
I run it on a 386 with math coprocessor. Pure waveform editing is
instantaneous, but it takes a while (a minute or so) to calculate
a spectrogram or a pitch track. I think the new version will be
even faster. I use a Data Translation DT2801A A/D board, but
they are discouraging this (it's old technology) and recommending
an Ariel board instead. These are a little expensive (by Southern
Illinois University standards--I think around 2-3 thousand dollars.
I have been very happy with the program, once I got some input/output
problems solved.
Ray Kent wrote a paper assessing all the packages on the market.
It appeared in the Journal of Speech and Hearing Research in 1990.
I have a page proof, so I don't have the exact reference. I believe
he was planning to update it, so you might write him. He is at
the department of Communicative Disorders, University of Wisconsin
at Madison, Madison, WI. 53706.


 Call professor Philippe MARTIN Director of the experimental Phonetics
Laboratory new college on Huron street in the University of Toronto. He has
some very powerful software running on IBM

 From: (Chilin Shih)

I haven't seen the new version of ILS. The old version
I saw are very difficult to use. For Example, it requires
you to specify beginning frame and ending frame for any
display. I find that too time-consuming.

Kay is convinient if you want real time display.
For general purpose phonetic experiment, I find (again,
many be old version by now) its customized packaging
annoying. What that means is that the setup is
very convinient if you want to do just what they want
you to do, but it's next to impossible if what you
want is not one of the customized package. THere
new, more powerful, therefore more expensive models
may be more versatile.

I've never used Hypersignal.

In my opinion, both ILS and Kay are too expensive for
what they deliver. (For Kay, you can easily spend
20K upwards.)

For that price, the $1500 Waves+ (that's an old price. Could
be doubled by now for new/better version)
marketed by Entropic (in Washington D.C)
is definitely the best value. But Waves+ runs on UNIX.
The speech analysis programs are super.
Hardly any pitch tracking error for a male speaker, for
example. I have feeded it with recordings with unbearably
loud background noise and the pitch track still
comes out right. There are numerous powerful functions for
displaying, editing, and labeling tasks. For example,
I am doing an experiment now, and I customize the
display in such a way that at one mouse click, I'll
get the wavesform, the pitch tracks, and a labeling window
displayed, and the speech played. I'll drop markers
with a click at the location I am interested, and the
time will be logged automatically. After finish one
sentence, a click and I get the whole thing for the next
sentence on my list. If I want pitch
value, in the version I have I have to write it down
in the label window. In the new version I believe you'll
get it automatically too.

If you don't already have a Kay, I would definitely
recommand a UNIX workstation (could be less than
$6000 these days, a bargain compared to Kay)
with Waves.

On the low end, there is something called
the CECIL Box that costs a few hundred (my impression
is that it's 2 or 3 hundred) dollars that offers
lots of the speech analysis programs, formant analysis,
pitch analysis, ...

I have heard people complaining about various things after they
buy a system. Some system only allow you to record
3 or 4 seconds at a time. Signilize for Mac, for example
(a 5000 system). This is not important if your experiments
are small and sentences are short. One person was devastated
because he intended to look into conversational speech.
If you are buying a system costing several thousand dollars,
the best bet will be to go through every procedure of the
experiment you want to run at the demo, the dealers place,
or ask about each steps on the phone. If the person
can't answer the question, insist that you
talk to someone who knows.

 From: Charles Read <>

You may wish to read our review:
 Read, Buder, & Kent. Speech analysis systems: an evaluation.
 Journal of Speech and Hearing Research, 1992, 35, 314-332.
Of course, even this review is somewhat out of date.
 On PCs, I personally recommend CSpeech and CSRE.

 From: Jyrki Tuomainen <>

re: your query on speech analysis software, I'd advise you to consult
an article in JSHR by Read, Buder & Kent, which is a thorough (and fair, IMO)
review of the most common systems in the market.
The complete ref is:
Read, C., Buder, E. & Kent, R. (1992) Speech Analysis Systems: An Evaluation.
Journal of Speech and Hearing Research, 35, 314-332.

What comes to my experience and recommendations, the only thing that
is clear to me is that if you don't have technical staff available and/or
you don't know how to deal with the technical details, I would suggest
that you settled for the complete packages like CSL (from Kay) or
Speech Station (Sensimetrics). The drawback with them is that they
are somewhat more expensive than e.g. CSRE, but you're sure to get
a package that does what it's supposed to.


I spent some months working on a research project connected with
automatic speech recognition. We used the Loughborough Sound Images
Speech Workstation, which runs on a PC AT (or 286/386 based
compatible), with 640k RAM, EGA/VGA graphics, Microsoft Mouse (or
compatible), Hard disk (40 MB recommended), RAM disk (required for
stereo recording or fast sample rates), DOS version 3.0 or greater.
The LSI Speech Workstation can display the signal in a variety of ways,
including black and white or full-colour spectrograms, waveforms,
spectral slices (cross-section through a spectrogram which is displayed
horizontally across the screen)... All of them are reasonably fast,
especially on a 386 PC. A wide range of bandwidths is available for the
spectrogram and the spectral slice, and the waveform can be scaled..
Several of these can be displayed at the same time by splitting the
screen. The screen can also be split to accommodate parts of two
separate recordings. The analog card supplied with the Speech
Workstation has two input channels each of which can be connected to
either a microphone or line output. Two markers are available, which
allow you to perform a number of operations on the signal, including
cutting and pasting, copying etc. It is possible to play only marked
sections of the signal on the screen. I can't remember how long
individual recordings could be, but I think it was something like 3 or
5 minutes maximum. Anyway, I quite enjoyed working with the Speech
Workstation. I don't know whether Loughborough Sound Images have
agents in the USA, but the best way to find out is probably to contact
them direct (they'll also send you further details):
Loughborough Sound Images Limited
The Technology Centre
Epinal Way
LE11 0QE
Telephone: (0509) 231 843 Telex: 34 1409 LUFBRA G
Fax: (0509) 262 433
