Editor for this issue: Scott Fults <scott
linguistlist.org>
For Query: Linguist 11.1611 Below is a summary of the responses that were sent to my query about analog/digital sound. Thanks to everyone for their very useful responses. The original query is also below. Dear Colleagues: I have 2 questions about analog/digital sound files: (1) what software and hardware works best for converting analog audio recordings into digital sound files? (2) what software works best for doing basic edits to digital sound files (e.g. clean up background noise; cut and splice)? I am currently in the process of developing a digital linguistic corpus for the Bantu language, Bemba, which is spoken in Zambia. I have over 200 hours of analog audio tapes of naturally occurring discourse, collected in field recordings and recorded from radio broadcasts. Most of the material is on 90 minute analog audio tapes. Individual recording events run 5 minutes-60 minutes, and I wish to store them as separate files in the digital format. Thank you for any advice on this. Debra Spitulnik Department of Anthropology tel: 404-727-3651 Emory University fax: 404-727-2860 1557 Pierce Drive email: dspitulMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueemory.edu Atlanta, GA 30322 USA RESPONSES FROM: Richard Wright, Assistant Professor University of Washington Department of Linguistics Box 354340 Seattle, WA 98195-4340 rawright
u.washington.edu There are two basic digitizing-editing tools that I know of. If you have a PC you can use a program called CoolEdit a reliable and cheap program. I use a Mac (better built in sound in general, more stylish :) so th program I use for digitizing and editing is SoundEdit (Macromedia), a $300 program that is really nice for editing and digitizing (though not much in the way of spectral analysis). Just for info: sampling at 22.05 kHz is a fine sampling rate unless you want to play out over standard CD players (as opposed to computers), you may even want to sample at 11.025 kHz if you're not going to use the files for acoustic analysis of stops and fricatives. Check out the net, there are likely to be a host of programs that are virtually free. xxxxxx FROM: Guido Vanden Wyngaerd K.U.Brussel Vrijheidslaan 17 B-1081 Brussel tel +32 2 412 43 49 fax & voice mail +32 70 70 27 44 mobile +32 476 40 72 82 guido.vandenwyngaerd
tijd.com I use CoolEdit2000, which is convenient, useful, and inexpensive. Consult www.syntrillium.com, from which you can download a trial version. xxxxxx Michael Stevens <mstevens
cup.cam.ac.uk> Sent me a very useful report that he prepared on the digitisation of sound files. He did not want me to quote passages from it. He suggested Cool Edit 2000 and MP3 for storage of data. Michael Stevens ELT Reference Cambridge University Press The Edinburgh Building Cambridge CB2 2RU United Kingdom ph. +44(0)1223.325925 fax +44(0)1223.325850 web http://uk.cambridge.org/elt/reference xxxx FROM: Nick Thieberger Department of Linguistics and Applied Linguistics University of Melbourne Phone 03 9388 9594 (home) Postal address: 27 John St, Brunswick East 3057 http://www.linguistics.unimelb.edu.au/people/thieberger/ntcv.html Digitising sound files is no big deal nowadays (especially on a mac:) ). The issue is how to link sound and text files for retrieval. http://www.ldc.upenn.edu has comprehensive links to annotation software. I have been using SoundIndex from Lacito to do the linking. It is easy to use and provides a textual index of the links between the audio and text/transcript file. This has the advantage of not requiring you to segment the audio file into separate files. And they also provide a Perlscript to export the index as XML. (there is a review if you go to the Computer assisted language worker, http://coombs.anu.edu.au/SpecialProj/ASEDA/CALANG/calang.html (go to the second edition from this site)) xxxx FROM: Julian Lloyd Swahili Multimedia Project Department of Linguistics and Southern African Languages University of Cape Town e-mail: jlloyd
Beattie.uct.ac.za or jlloyd
iafrica.com Re sound processing programmes - I like Goldwave - it's a free download (shareware, but they just keep reminding you that you haven't registered!). It has good processing and clean-up - I use it to prepare .WAV files for my multimedia programmes. Your tasks will be very expensive in terms of memory space, by the way - you'll have to record in a fairly high format for quality, but save in something like 8-bit mono, or the files will run to tens of Mbytes! xxxxx FROM: taimi metzler umevoice inc. novato, ca taimi
umevoice.com Analog Conversion: unfortunately, i'm not sure. i don't do much of that. however, i'm sure others will offer that information. if you don't get any responses, write to barbara fox at the university of colorado (barbara.fox
colorado.edu). she has done a fair amount of conversion of conversation tapes and should be able to point you in the right direction. What software/hardware for editing and processing? really, it's hard to decide. the options are quite limited in number, but fairly equal in quality. The only firm choice is NOT macintosh--not because of the hardware, but because there's no really decent sound processing software available. Unix: the best option used to be entropic waves (cost: U$8000), but entropic was recently purchased by microsoft and MS took waves off the market. Gail Ayer is a wonderful linguist who has worked for entropic--i don't know her email, but if you can locate her, you may be able to get more information about the availability of entropic products and esp. speech-wave processing products. good luck. barring entropic waves, the best speech processing software i've seen is a shareware program on Windows called CoolEdit2000. it has some very nice processing routines which can be made in to batch processes for multiple files for cleaning up, it's very easy to use, and i've been pleased with the quality of processing so far. xxxx FROM: Toby Paff tobypaff
Princeton.EDU UNIX systems Computing and Information Technology Princeton University Noticed your query about dealing with analogue sound. A colleague of mine has been very interested in a similar topic so I am sending along his advice: ".. here're some places to start. According to most of the things I've seen on the web, "Spin Doctor, version 4" is the way to go with analog. It has the hiss/pop cleanup feature. It's boxed with "EZ-CD Creator Deluxe" by Adaptec. Egghead has it for $79.99, Mfr part number: ASW-EZCDCRTR V4 RTL. I bought a new Plextor 12/4/32. They're still hard to come by and are usually on back order. I waited about 25 days for mine. I don't know if you can burn sound CDs at 12X, but I'll see. You can get pretty good deals on the older 8X burners now, though. (And if it turns out 12X is only good for data, and sound still is 1X-what's the difference? But I'd recommend a burner that allows the firmware can be upgraded.) The most helpful place I found was the CD-R FAQ: http://www.fadden.com/cdrfaq/ xxxxx From: "Bernard Kripkee" <bernard.kripkee
prodigy.net> I can't tell you what software works best for your purposes, but I can tell you what I have been using. SIL (Summer Institute for Linguistics) offers a package of freeware called Speech Analyzer that can do a fine job of basic edits, at least on relatively brief segments of sound. You can download at no charge it from their website, www.sil.org. It runs under Windows. My Toshiba Satellite laptop has an input jack for a tape recorder. One could simply connect the recorder to the jack and use the basic Windows audio utilities to capture a file. The file can then be edited with the SIL Speech Analyzer. xxxxx FROM: Anonymous The questions I'd ask are: why are you producing the corpus? Is it for instrumental acoustic analysis? Discourse strategies? Language learning? Is the size of the resulting files important? Will investigators use your equipment, or their own? (Some file formats are Mac and PC compatible, others aren't; some formats are Internet-friendly--compressed and "loss-y", but small; you can digitize at very high fidelity, but this leads to very large files, and need only be done if acoustic analysis is in the offing.) If I had a limited budget, I'd worry less about the final stages (digitizing) than the first stage: getting as clean and high-quality a recording as I possibly can. But I'm exposed to phonetic analysis. Many people not realistic with their needs, and wanted everything preserved in one format--the highest quality possible--regardless of purpose or application. BTW, you probably don't want to keep the 50-60 minute interviews (etc.) in one large file unless there's a very good reason for it: the files will probably be too big, unless you're going to compress them. But compression rules out some possible uses. Software: I usually use something called CoolEdit96 to digitize sound. No longer available (except as a partially crippled bit of shareware unless you "find" a registration code), there's a newer edition out that does more than any linguist needs. Sonic Foundry's SoundForge (I believe is the name) digitizes quite nicely. Both of these are primarily for musicians, and include nifty ways of modifying sound (all-in-one production/post-production studios). And, of course, KayElMetrics software accompanying Computer Speech Lab is sort of the gold standard for some people (but you can't digitize large chunks of sound with it). PCQuirer should also do the job, and has a more restricted palette of tools--one suited just to linguistic needs. But "linguistic needs" in the area of noise reduction tend to be low- and high-pass filters, while CoolEdit and SoundForge are much more sophisticated and can filter out more random background noise. Hardware: Not all of us has $9000 to spend on Computer Speech Lab. A $100 sound card will work, but may give a little distortion or background hum. This is bad if acoustic analysis is in the plans for these files. (Nonetheless my wife, a phonetically-oriented phonologist, used just such a card in her dissertation work with no problems.) A high quality sound card--in the $400 range--should be good enough. Actually, by now, the price is probably down to $300. By "good enough" I mean that there will be very, very little distortion and noise, less than needed for any but the most particular kinds of instrumental analysis. Certainly less than most dedicated hardware from the 1980s, and probably less than your analog tape player introduced. Formant structure, voicing, pitch will all be preserved accuratedly enough for nearly any need. A caveat, however: I've seen perfectly good sound cards give really bad results with some computers. You should check the sound card by getting the kind of tuner an intermediate level violinist might have, one that emits a variety of electronically produced fairly pure pitches, and recording a few different pitches for 3-4 minutes each. Then do a quick analysis of the sound: look for fluctuating pitch that isn't the result of the tuner's problems, look for anomalous noise in the signal (say, a nice 60 Hz band of energy in the spectrogram). One computer I ran across introduced a loud "tick" into the signal every few seconds, and a rather loud hum. (Shielding and internal configurations of the hardware can lead to this, and we lowly mortals can't easily fix the problem.) If you're not doing phonetic analysis, try recording something from a tape or from a microphone, and play back the results: if they're good enough, find. Only trust your setup after you've tested it. All things being equal, I would probably do the following for everything from phonetic analysis to language pedagogy: Get a $400 ($300?) sound card in a fairly new PC, and make sure it works. I'd digitize at 44 MHz ("CD quality sound") using CoolEdit or something like it, in *mono*. This yields *huge* files, and should be done in no more than 15 minute pieces (ok, 20 minute pieces) that can later be broken into smaller bits, if needed. I would save these in "WAV" format (it's a PC format that most Mac programs can handle). All these can be later be transferred to CDs for storage. Acoustic analysis software typically handles WAV format, as well. If you have 200 hours of recording, though, ponder how much storage will cost; maybe saving the 44 MHz files is unreasonable. Maybe even *producing* 44 MHz files is unreasonable. Then, depending upon the final purpose, I'd further process the files to make them usable. E.g., noise reduction, compression. If you want them available over the Internet for acoustic analysis, I'd edit them to 22 MHz (good enough for most purposes; it's just above the Nyquist frequency for the upper ranges of what we humans use in speech). This halves their size. (It's possible to digitize directly to 22 MHz; I find that digitizing at a very high quality and then "dumbing down" the sampling rate just plain sounds better; but doing this with 200 hours of sound requires a *lot* of patience.) I would then compress them using WinZip (PC) or StuffIt (Mac) to reduce their size even further, in a loss-less way. *These* files I'd make available on the 'Net. If you want them available for language learning or discourse strategies (or any other non-phonetic purpose), I'd get RealMedia's RealEncoder 5.0 or G3 (5.0 was freeware and is "floating" in cyberspace somewhere still, G3 costs a little $) and, starting with the high quality files you made, I'd produce radio quality sound files. They're relatively small and RealAudio is one of the Internet standards. While WAV files *can* be used--and little ones are used--for sound on the Web, RealMedia files are much smaller. If you use RealMedia compression, try a variety of their options before settling on the one you'll use: if you have a crisp, clean source you can get by with a smaller ("loss-ier") format than if you begin with a moderately muddy-sounding source. Ultimately it's always a trade-off between size and quality: judge the quality you *need*, and the threshold for unacceptable, and go from there. Any of these formats can be transferred to CD for distribution to colleagues--again, the file format (WAV, RealMedia, 44 MHz vs. 22 MHz vs. something else) depends on the final quality *needed* or posted on a Website somewhere. I note that you're in the Anthropology dept. If I assume that phonetics-phonology aren't your goals, but that you are more interested in discourse strategies and the like, there's a quicker and easier way: it rules out acoustic analysis, but if this is ok, fine. RealMedia's RealEncoder 5.0 (or G3) can take analog input and create RealMedia files directly: it does the digitizing and encoding nearly simultaneously. This saves time, nerves, patience, and disk space. *Drawbacks*: the process can be a bit clunky, you have to manually start the tape player and recording, and there's no possibility of editing or noise reduction. If you go this route, find a particularly poor section of tape to experiment with, and make half a dozen short recordings of that section at various compression levels before deciding which one is right for your needs. If you can do even 25% of your corpus this way, it'll save many, many hours of time. Assuming that RealMedia is ok, you'll still have a large quantity of data, now on disk. I'd contact your IT people and see about getting the RealMedia files put on a media server that supports streaming sound. (They already have one, I'm sure; they'll balk at the volume of data you have, but will probably cave in fairly quickly, esp. if you eventually offer to buy them another hard drive, < $300).