LINGUIST List 8.986

Wed Jul 2 1997

Review: WinSAL-V/Speechlab-- Reprise

Editor for this issue: Andrew Carnie <carnielinguistlist.org>


What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Andrew Carnie at carnielinguistlist.org

Message 1: Re: Review of software

Date: Wed, 02 Jul 1997 14:40:37 -0500
From: Geoffrey S. Nathan <geoffnsiu.edu>
Subject: Re: Review of software

*** ***
Editor's note: 
The editors and moderators of the LINGUIST list would like to thank 
Prof. Nathan for agreeing to do a second review of this software 
package. Through no fault of his own, the last review of this software
was of a sample version rather than of the full package. Prof. Nathan
kindly offered to do a second review of the full version. Our thanks
once again.
*** ***


WinSAL-V
Media Enterprise-Ingolf Franke, Manager
Technolgie-Zentrum Trier
Gottbillstrasse 34a, D-54294 Trier


Reviewed by Geoffrey S. Nathan <geoffnsiu.edu>

A number of months ago I wrote a review in this forum of a speech analysis
package and phonetics teaching multimedia program called WinSAL-V and
Speechlab. Due to problems with the installation of the programs (which
are distributed on CD-ROM) it turns out I did not actually test the
full-scale speech analysis program (which I erroneously called Speechlab),
but rather a toy demonstration version thereof.
	Since that time I have been able to reevaluate WinSAL-V, and must
substantially revise my conclusions. WinSal is a completely configured
speech package, with considerable flexibility in display options. In
addition to the standard waveform editing screen there are screens
available which display spectrograms, energy envelopes and fundamental
frequency, as well as FFT, cepstrum and other measures. Each of these is
independently configurable, and all can be displayed simultaneously
(although the spectrogram tends to get a bit squashed if you display more
than two windows at a time.
 Recording can be done at a choice of sampling frequencies: 11025, 22050
and 44100 Hz, and can be 8 or 16 bit. The program uses the standard
Windows sound interface, and consequently will work on any current
multimedia-configured computer with a sound board.
 The spectrogram permits a choice of filter-types (rectangle, triangle,
hand, hamming, papoulis), a choice of bandwidth (128, 256, 512 and 1024
points), varying levels of precision (arbitrarily ranging from 0 to 99),
frequency displays up to a maximum of 20 kHz and adjustable limits of
display in dB.
 The energy and F0 displays have comparable parameters that can be
adjusted, well beyond my knowledge of the technical details involved. In
addition, there are a set of short-term measures available, such as LPC,
FFT, autocorrelation and even cepstrum. In each of these a waveform is
displayed in an upper window, and placing the cursor in some spot on the
waveform displays a time slice of the relevant measure in a lower, larger
window.
 As I mentioned in the earlier review, the spectrograms display in a heat
scale with higher amplitude in red and the background is technically dark
blue (although it looks black on my monitors). The cursor placement
provides an instantaneous display of frequency and amplitude in the lower
left corner of the spectrogram, and there are comparable displays for each
of the other windows.
 The program does permit printing on any Windows capable printer, but on my
Pentium 166 it took almost four minutes to print a spectrogram (printing
the waveforms and similar line-based displays is much quicker). The black
background is a particular disadvantage on a black and white printer (I
tried both an inkjet and a laser printer), in that most of us are more used
to black on white rather than the reverse.
 Finally, there is a version (which I received as a review copy) that is
called WinSAL-V, for `Video option'. This permits the simultaneous display
of speech and a video of that speech. I couldn't tell from the
documentation if it was possible to create your own set of examples, but on
the CD-ROM there are a set of English and German sounds that include videos
of speakers making the sounds in question. For example, one could display
the file of a woman saying the German word `Ja', move the cursor to some
specified point on the waveform (say the point at which the vowel begins)
and look at (or even measure) jaw movement--the video of this word is a
lateral view. For example, it is interesting to note that the jaw does not
begin to drop until approximately one third of the way into the /a/, and
reaches its maximum extent at about two thirds of the way through the vowel.
 There appear to be two major drawbacks to this program. One is that each
window is independent, and there is no way to synchronize them. This means
that placing the cursor in a particular spot on the oscillogram does not
simultaneously place it on the same spot on a spectrogram (something that
the older programs CSRE and Signalyze both permit).
 The second problem is that it is not possible to measure the length of a
segment by simply by placing the cursor at the beginning and end of the
segment and reading off a value on the screen. Through the use of the zoom
facility it is possible to display a single segment, and with judicious use
of a mouse get the left and right edges relatively precise. However, you
cannot simultaneously see the selected segment within the surrounding
larger context. In addition, the numbers displayed at the left and right
edge are the absolute values of the selected segment within the overall
waveform. If one wanted, for example to measure the length of the /a/ in
`ja', one has to switch to a calculator to subtract 949 from 1338. CSRE,
for example, shows the entire signal (or a zoomed version thereof) but then
pops up a window for each of the left and right edges of the segment you
are interested in. Then separately, each edge can be fine-tuned (say to a
zero crossing, or to the first visible voicing pulse), then the screen
display will give an exact readout of the size of the segment without the
necessity of zooming into it. These are relatively minor problems, and
perhaps a later version of the program could remedy them.
 Despite these two (relatively) minor caveats, I can highly recommend this
program. It is a cheap alternative to CSL (the complete CSL package costs
over US$5000, and a separate Windows version of CSL that also relies on a
standard PC sound card costs $1000), and while it costs about the same as
CSRE, it uses the standard Windows interface, and works on files that are
in the standard WAV format that most other Windows sound-based programs
use. If you want, you can make a spectrogram of the Windows 95 boot-up
`tinkle'.
 Pricing for WinSAL is approximately $230 (all pricing is in DM) for the
bare-bones program, $260 for the CD-ROM or the video version program alone.
 The complete package is $288, and a version with appropriate video editing
software is $1155. There is a $173 discount with proof of student status.
A demo version of the program can be downloaded from their website
<http://www.media-enterprise.de>;, and you can fax your credit card order,
thus obviating currency conversion problems. If you order the non-CD-ROM
version they will e-mail your registration number, thus permitting a
completely electronic transaction of the program.

Geoffrey S. Nathan
Department of Linguistics
Southern Illinois University at Carbondale,
Carbondale, IL, 62901 USA
Phone: +618 453-3421 (Office) FAX +618 453-6527
+618 549-0106 (Home)

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue