LINGUIST List 11.1690

Sat Aug 5 2000

Sum: Analog to Digital Sound Software

Editor for this issue: Scott Fults <>


  1. Debra Spitulnik, Analog to Digital Sound Software

Message 1: Analog to Digital Sound Software

Date: Fri, 04 Aug 2000 17:37:27 -0400
From: Debra Spitulnik <>
Subject: Analog to Digital Sound Software

For Query: Linguist 11.1611

Below is a summary of the responses that were sent to my query about
analog/digital sound. Thanks to everyone for their very useful
responses. The original query is also below.

Dear Colleagues:

I have 2 questions about analog/digital sound files:

(1) what software and hardware works best for converting analog audio
recordings into digital sound files?

(2) what software works best for doing basic edits to digital sound
files (e.g. clean up background noise; cut and splice)?

I am currently in the process of developing a digital linguistic corpus
for the Bantu language, Bemba, which is spoken in Zambia. I have over
200 hours of analog audio tapes of naturally occurring discourse,
collected in field recordings and recorded from radio broadcasts. Most
of the material is on 90 minute analog audio tapes. Individual
recording events run 5 minutes-60 minutes, and I wish to store them as
separate files in the digital format.

Thank you for any advice on this.

Debra Spitulnik
Department of Anthropology tel: 404-727-3651
Emory University fax: 404-727-2860
1557 Pierce Drive email:
Atlanta, GA 30322 USA


FROM: Richard Wright, Assistant Professor
University of Washington
Department of Linguistics
Box 354340
Seattle, WA 98195-4340

There are two basic digitizing-editing tools that I know of. If you
have a PC you can use a program called CoolEdit a reliable and cheap
program. I use a Mac (better built in sound in general, more stylish
:) so th program I use for digitizing and editing is SoundEdit
(Macromedia), a $300 program that is really nice for editing and
digitizing (though not much in the way of spectral analysis). Just for
info: sampling at 22.05 kHz is a fine sampling rate unless you want to
play out over standard CD players (as opposed to computers), you may
even want to sample at 11.025 kHz if you're not going to use the files
for acoustic analysis of stops and fricatives.

Check out the net, there are likely to be a host of programs that are
virtually free.


FROM: Guido Vanden Wyngaerd
Vrijheidslaan 17
B-1081 Brussel
tel +32 2 412 43 49
fax & voice mail +32 70 70 27 44
mobile +32 476 40 72 82

I use CoolEdit2000, which is convenient, useful, and inexpensive.
Consult, from which you can download a trial


Michael Stevens <> Sent me a very useful report
that he prepared on the digitisation of sound files. He did not want
me to quote passages from it. He suggested Cool Edit 2000 and MP3 for
storage of data.

Michael Stevens
ELT Reference
Cambridge University Press
The Edinburgh Building
Cambridge CB2 2RU
United Kingdom
ph. +44(0)1223.325925
fax +44(0)1223.325850


FROM: Nick Thieberger
Department of Linguistics and Applied Linguistics
University of Melbourne
Phone 03 9388 9594 (home)
Postal address: 27 John St, Brunswick East 3057

Digitising sound files is no big deal nowadays (especially on a mac:)
). The issue is how to link sound and text files for retrieval. has comprehensive links to annotation

I have been using SoundIndex from Lacito to do the linking. It is easy
to use and provides a textual index of the links between the audio and
text/transcript file. This has the advantage of not requiring you to
segment the audio file into separate files. And they also provide a
Perlscript to export the index as XML. (there is a review if you go to
the Computer assisted language worker, (go to
the second edition from this site))


FROM: Julian Lloyd
Swahili Multimedia Project
Department of Linguistics and Southern African Languages
University of Cape Town
e-mail: or

Re sound processing programmes - I like Goldwave - it's a free
download (shareware, but they just keep reminding you that you haven't

It has good processing and clean-up - I use it to prepare .WAV files
for my multimedia programmes.

Your tasks will be very expensive in terms of memory space, by the way -

you'll have to record in a fairly high format for quality, but save in
something like 8-bit mono, or the files will run to tens of Mbytes!


FROM: taimi metzler
umevoice inc.
novato, ca

Analog Conversion: unfortunately, i'm not sure. i don't do much of
that. however, i'm sure others will offer that information. if you
don't get any responses, write to barbara fox at the university of
colorado ( she has done a fair amount of
conversion of conversation tapes and should be able to point you in
the right direction.

What software/hardware for editing and processing? really, it's hard
to decide. the options are quite limited in number, but fairly equal
in quality. The only firm choice is NOT macintosh--not because of the
hardware, but because there's no really decent sound processing
software available.

Unix: the best option used to be entropic waves (cost: U$8000), but
entropic was recently purchased by microsoft and MS took waves off the
market. Gail Ayer is a wonderful linguist who has worked for
entropic--i don't know her email, but if you can locate her, you may
be able to get more information about the availability of entropic
products and esp. speech-wave processing products. good luck.

barring entropic waves, the best speech processing software i've
seen is a shareware program on Windows called CoolEdit2000. it
has some very nice processing routines which can be made in to
batch processes for multiple files for cleaning up, it's very
easy to use, and i've been pleased with the quality of processing
so far.


FROM: Toby Paff tobypaffPrinceton.EDU
UNIX systems
Computing and Information Technology
Princeton University

Noticed your query about dealing with analogue sound. A colleague of
mine has been very interested in a similar topic so I am sending along
his advice:

".. here're some places to start. According to most of the things
I've seen on the web, "Spin Doctor, version 4" is the way to go with
analog. It has the hiss/pop cleanup feature. It's boxed with "EZ-CD
Creator Deluxe" by Adaptec. Egghead has it for $79.99, Mfr part

 I bought a new Plextor 12/4/32. They're still hard to come by
and are usually on back order. I waited about 25 days for mine. I
don't know if you can burn sound CDs at 12X, but I'll see. You can get
pretty good deals on the older 8X burners now, though. (And if it
turns out 12X is only good for data, and sound still is 1X-what's the
difference? But I'd recommend a burner that allows the firmware can be
 The most helpful place I found was the CD-R FAQ:


From: "Bernard Kripkee" <>

I can't tell you what software works best for your purposes, but I can
tell you what I have been using. SIL (Summer Institute for
Linguistics) offers a package of freeware called Speech Analyzer that
can do a fine job of basic edits, at least on relatively brief
segments of sound. You can download at no charge it from their
website, It runs under Windows.

My Toshiba Satellite laptop has an input jack for a tape recorder.
One could simply connect the recorder to the jack and use the basic
Windows audio utilities to capture a file. The file can then be
edited with the SIL Speech Analyzer.


FROM: Anonymous

The questions I'd ask are: why are you producing the corpus? Is it
for instrumental acoustic analysis? Discourse strategies? Language
learning? Is the size of the resulting files important? Will
investigators use your equipment, or their own?

(Some file formats are Mac and PC compatible, others aren't; some
formats are Internet-friendly--compressed and "loss-y", but small; you
can digitize at very high fidelity, but this leads to very large
files, and need only be done if acoustic analysis is in the offing.)

If I had a limited budget, I'd worry less about the final stages
(digitizing) than the first stage: getting as clean and high-quality a
recording as I possibly can. But I'm exposed to phonetic analysis.

Many people not realistic with their needs, and wanted everything
preserved in one format--the highest quality possible--regardless of
purpose or application. BTW, you probably don't want to keep the
50-60 minute interviews (etc.) in one large file unless there's a very
good reason for it: the files will probably be too big, unless you're
going to compress them. But compression rules out some possible uses.


I usually use something called CoolEdit96 to digitize sound. No
longer available (except as a partially crippled bit of shareware
unless you "find" a registration code), there's a newer edition out
that does more than any linguist needs. Sonic Foundry's SoundForge (I
believe is the name) digitizes quite nicely. Both of these are
primarily for musicians, and include nifty ways of modifying sound
(all-in-one production/post-production studios). And, of course,
KayElMetrics software accompanying Computer Speech Lab is sort of the
gold standard for some people (but you can't digitize large chunks of
sound with it). PCQuirer should also do the job, and has a more
restricted palette of tools--one suited just to linguistic needs. But
"linguistic needs" in the area of noise reduction tend to be low- and
high-pass filters, while CoolEdit and SoundForge are much more
sophisticated and can filter out more random background noise.


Not all of us has $9000 to spend on Computer Speech Lab. A $100 sound
card will work, but may give a little distortion or background hum.
This is bad if acoustic analysis is in the plans for these files.
(Nonetheless my wife, a phonetically-oriented phonologist, used just
such a card in her dissertation work with no problems.) A high
quality sound card--in the $400 range--should be good enough.
Actually, by now, the price is probably down to $300. By "good
enough" I mean that there will be very, very little distortion and
noise, less than needed for any but the most particular kinds of
instrumental analysis. Certainly less than most dedicated hardware
from the 1980s, and probably less than your analog tape player
introduced. Formant structure, voicing, pitch will all be preserved
accuratedly enough for nearly any need.

A caveat, however: I've seen perfectly good sound cards give really
bad results with some computers. You should check the sound card by
getting the kind of tuner an intermediate level violinist might have,
one that emits a variety of electronically produced fairly pure
pitches, and recording a few different pitches for 3-4 minutes each.
Then do a quick analysis of the sound: look for fluctuating pitch that
isn't the result of the tuner's problems, look for anomalous noise in
the signal (say, a nice 60 Hz band of energy in the spectrogram). One
computer I ran across introduced a loud "tick" into the signal every
few seconds, and a rather loud hum. (Shielding and internal
configurations of the hardware can lead to this, and we lowly mortals
can't easily fix the problem.) If you're not doing phonetic analysis,
try recording something from a tape or from a microphone, and play
back the results: if they're good enough, find. Only trust your setup
after you've tested it.

All things being equal, I would probably do the following for
everything from phonetic analysis to language pedagogy: Get a $400
($300?) sound card in a fairly new PC, and make sure it works. I'd
digitize at 44 MHz ("CD quality sound") using CoolEdit or something
like it, in *mono*. This yields *huge* files, and should be done in
no more than 15 minute pieces (ok, 20 minute pieces) that can later be
broken into smaller bits, if needed. I would save these in "WAV"
format (it's a PC format that most Mac programs can handle). All
these can be later be transferred to CDs for storage. Acoustic
analysis software typically handles WAV format, as well. If you have
200 hours of recording, though, ponder how much storage will cost;
maybe saving the 44 MHz files is unreasonable. Maybe even *producing*
44 MHz files is unreasonable. Then, depending upon the final purpose,
I'd further process the files to make them usable. E.g., noise
reduction, compression.

If you want them available over the Internet for acoustic analysis,
I'd edit them to 22 MHz (good enough for most purposes; it's just
above the Nyquist frequency for the upper ranges of what we humans use
in speech).

This halves their size. (It's possible to digitize directly to 22
MHz; I find that digitizing at a very high quality and then "dumbing
down" the sampling rate just plain sounds better; but doing this with
200 hours of sound requires a *lot* of patience.) I would then
compress them using WinZip (PC) or StuffIt (Mac) to reduce their size
even further, in a loss-less way. *These* files I'd make available on
the 'Net.

If you want them available for language learning or discourse
strategies (or any other non-phonetic purpose), I'd get RealMedia's
RealEncoder 5.0 or G3 (5.0 was freeware and is "floating" in
cyberspace somewhere still, G3 costs a little $) and, starting with
the high quality files you made, I'd produce radio quality sound
files. They're relatively small and RealAudio is one of the Internet
standards. While WAV files *can* be used--and little ones are
used--for sound on the Web, RealMedia files are much smaller. If you
use RealMedia compression, try a variety of their options before
settling on the one you'll use: if you have a crisp, clean source you
can get by with a smaller ("loss-ier") format than if you begin with a
moderately muddy-sounding source. Ultimately it's always a trade-off
between size and quality: judge the quality you *need*, and the
threshold for unacceptable, and go from there.

Any of these formats can be transferred to CD for distribution to
colleagues--again, the file format (WAV, RealMedia, 44 MHz vs. 22 MHz
vs. something else) depends on the final quality *needed* or posted
on a Website somewhere.

I note that you're in the Anthropology dept. If I assume that
phonetics-phonology aren't your goals, but that you are more
interested in discourse strategies and the like, there's a quicker and
easier way: it rules out acoustic analysis, but if this is ok, fine.
RealMedia's RealEncoder 5.0 (or G3) can take analog input and create
RealMedia files directly: it does the digitizing and encoding nearly
simultaneously. This saves time, nerves, patience, and disk space.
*Drawbacks*: the process can be a bit clunky, you have to manually
start the tape player and recording, and there's no possibility of
editing or noise reduction. If you go this route, find a particularly
poor section of tape to experiment with, and make half a dozen short
recordings of that section at various compression levels before
deciding which one is right for your needs. If you can do even 25% of
your corpus this way, it'll save many, many hours of time.

Assuming that RealMedia is ok, you'll still have a large quantity of
data, now on disk. I'd contact your IT people and see about getting
the RealMedia files put on a media server that supports streaming
sound. (They already have one, I'm sure; they'll balk at the volume
of data you have, but will probably cave in fairly quickly, esp. if
you eventually offer to buy them another hard drive, < $300).
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue