Editor for this issue: Andrew Carnie <carnie
linguistlist.org>
*** *** Editor's note: The editors and moderators of the LINGUIST list would like to thank Prof. Nathan for agreeing to do a second review of this software package. Through no fault of his own, the last review of this software was of a sample version rather than of the full package. Prof. Nathan kindly offered to do a second review of the full version. Our thanks once again. *** *** WinSAL-V Media Enterprise-Ingolf Franke, Manager Technolgie-Zentrum Trier Gottbillstrasse 34a, D-54294 Trier Reviewed by Geoffrey S. Nathan <geoffnMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuesiu.edu> A number of months ago I wrote a review in this forum of a speech analysis package and phonetics teaching multimedia program called WinSAL-V and Speechlab. Due to problems with the installation of the programs (which are distributed on CD-ROM) it turns out I did not actually test the full-scale speech analysis program (which I erroneously called Speechlab), but rather a toy demonstration version thereof. Since that time I have been able to reevaluate WinSAL-V, and must substantially revise my conclusions. WinSal is a completely configured speech package, with considerable flexibility in display options. In addition to the standard waveform editing screen there are screens available which display spectrograms, energy envelopes and fundamental frequency, as well as FFT, cepstrum and other measures. Each of these is independently configurable, and all can be displayed simultaneously (although the spectrogram tends to get a bit squashed if you display more than two windows at a time. Recording can be done at a choice of sampling frequencies: 11025, 22050 and 44100 Hz, and can be 8 or 16 bit. The program uses the standard Windows sound interface, and consequently will work on any current multimedia-configured computer with a sound board. The spectrogram permits a choice of filter-types (rectangle, triangle, hand, hamming, papoulis), a choice of bandwidth (128, 256, 512 and 1024 points), varying levels of precision (arbitrarily ranging from 0 to 99), frequency displays up to a maximum of 20 kHz and adjustable limits of display in dB. The energy and F0 displays have comparable parameters that can be adjusted, well beyond my knowledge of the technical details involved. In addition, there are a set of short-term measures available, such as LPC, FFT, autocorrelation and even cepstrum. In each of these a waveform is displayed in an upper window, and placing the cursor in some spot on the waveform displays a time slice of the relevant measure in a lower, larger window. As I mentioned in the earlier review, the spectrograms display in a heat scale with higher amplitude in red and the background is technically dark blue (although it looks black on my monitors). The cursor placement provides an instantaneous display of frequency and amplitude in the lower left corner of the spectrogram, and there are comparable displays for each of the other windows. The program does permit printing on any Windows capable printer, but on my Pentium 166 it took almost four minutes to print a spectrogram (printing the waveforms and similar line-based displays is much quicker). The black background is a particular disadvantage on a black and white printer (I tried both an inkjet and a laser printer), in that most of us are more used to black on white rather than the reverse. Finally, there is a version (which I received as a review copy) that is called WinSAL-V, for `Video option'. This permits the simultaneous display of speech and a video of that speech. I couldn't tell from the documentation if it was possible to create your own set of examples, but on the CD-ROM there are a set of English and German sounds that include videos of speakers making the sounds in question. For example, one could display the file of a woman saying the German word `Ja', move the cursor to some specified point on the waveform (say the point at which the vowel begins) and look at (or even measure) jaw movement--the video of this word is a lateral view. For example, it is interesting to note that the jaw does not begin to drop until approximately one third of the way into the /a/, and reaches its maximum extent at about two thirds of the way through the vowel. There appear to be two major drawbacks to this program. One is that each window is independent, and there is no way to synchronize them. This means that placing the cursor in a particular spot on the oscillogram does not simultaneously place it on the same spot on a spectrogram (something that the older programs CSRE and Signalyze both permit). The second problem is that it is not possible to measure the length of a segment by simply by placing the cursor at the beginning and end of the segment and reading off a value on the screen. Through the use of the zoom facility it is possible to display a single segment, and with judicious use of a mouse get the left and right edges relatively precise. However, you cannot simultaneously see the selected segment within the surrounding larger context. In addition, the numbers displayed at the left and right edge are the absolute values of the selected segment within the overall waveform. If one wanted, for example to measure the length of the /a/ in `ja', one has to switch to a calculator to subtract 949 from 1338. CSRE, for example, shows the entire signal (or a zoomed version thereof) but then pops up a window for each of the left and right edges of the segment you are interested in. Then separately, each edge can be fine-tuned (say to a zero crossing, or to the first visible voicing pulse), then the screen display will give an exact readout of the size of the segment without the necessity of zooming into it. These are relatively minor problems, and perhaps a later version of the program could remedy them. Despite these two (relatively) minor caveats, I can highly recommend this program. It is a cheap alternative to CSL (the complete CSL package costs over US$5000, and a separate Windows version of CSL that also relies on a standard PC sound card costs $1000), and while it costs about the same as CSRE, it uses the standard Windows interface, and works on files that are in the standard WAV format that most other Windows sound-based programs use. If you want, you can make a spectrogram of the Windows 95 boot-up `tinkle'. Pricing for WinSAL is approximately $230 (all pricing is in DM) for the bare-bones program, $260 for the CD-ROM or the video version program alone. The complete package is $288, and a version with appropriate video editing software is $1155. There is a $173 discount with proof of student status. A demo version of the program can be downloaded from their website <http://www.media-enterprise.de>, and you can fax your credit card order, thus obviating currency conversion problems. If you order the non-CD-ROM version they will e-mail your registration number, thus permitting a completely electronic transaction of the program. Geoffrey S. Nathan Department of Linguistics Southern Illinois University at Carbondale, Carbondale, IL, 62901 USA Phone: +618 453-3421 (Office) FAX +618 453-6527 +618 549-0106 (Home)