LINGUIST List 17.2233
|
Thu Aug 03 2006
Sum: Sound-File Formats for Speech Recordings
Editor for this issue: Kevin Burrows
<kevin linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Mario
Cal-Varela,
Sound-File Formats for Speech Recordings
Message 1: Sound-File Formats for Speech Recordings
|
Date: 01-Aug-2006
From: Mario Cal-Varela <iamario usc.es>
Subject: Sound-File Formats for Speech Recordings
Query for this summary posted in LINGUIST Issue: 17.2131 Regarding Query: http://linguistlist.org/issues/17/17-2131.html Dear Linguists: Last July 24 I posted a query to the list regarding the adequacy of different file formats for computerized speech analysis. This was the original text of the query: I'd like to compare digital speech samples collected from different sources, including online radio and samples digitized by myself from analogical sources. I'm specially interested in fundamental frequency and formant position, as well as time-related aspects of segments (specifically VOT and vowel duration). My questions are the following: What features of the speech signal (and in what ways) may be affected by the format of speech samples (MP3, WAV, stream audio...)? Are the results of spectrographic analysis of samples with different file formats and qualities comparable? Is there any relevant bibliography available on this issue? First of all, thanks very much to those people who responded to my questions and provided very useful and relevant suggestions: James L. Fidelholtz, Benemérita Universidad Autónoma de Puebla, MÉXICO Mark J. Jones, University of Cambridge Dominic Watt, University of Aberdeen Damien Hall, University of Pennsylvania Heriberto Avelino, University of California at Berkeley Here is a quick summary of their comments: Although measurements of duration and time-related aspects of the signal do not seem to be affected by file format, for formant and F0 analysis the consensus is that, among the usual formats, only .WAV and .AIFF files are safe bets. Compression algorithms used for MP3, MiniDisc and similar affect the signal in many different ways and basically degrade it. On the other hand, James Fidelholtz comments that, if properly processed, even very noisy speech can yield to acoustic analysis. For example, he suggests using cepstrum analysis to get the formants and F0, following these steps: 1) get the signal digitalized (if it is analogic); or get the digitalized signal, if available. (= S) 2) do a computerized spectrum of the signal. [Sp(S)] 3) do a cepstrum of Sp(S) (spectrum of the spectrum--this will give you the fundamental frequency F0 for each discrete sampling point along the spectrum over time) 4) have the computer consider *only* the points of Sp(S) which are 'near'integral multiples of F0, and plot the result. This will give you the formants, even for extremely noisy speech. The topic seems to recur on discussion lists, so several respondents suggest using search terms such as MP3, ATRAC, FORMANT, etc. on Google or on discussion list search engines, for example on PHONET (http://www.jiscmail.ac.uk/cgi-bin/webadmin?S1=phonet). Mark Jones sends the following, from Linguist: http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0409&L=resource-network-linguistic-diversity&P=629. The IEEE website is also mentioned by several respondents as a possible source of further information (Institute of Electrical and Electronics Engineers, Inc. http://www.ieee.org). For an example of a major project where digitised speech was used, Damien Hall mentions the Atlas of North American English, which incidentally used only Wav files (more information at: http://www.mouton-online.com/anae.php). As for bibliography on the topic, there were also a few suggestions:: - http://www.di.unipi.it/~lcioni/papers/2001/CompData.pdf. - Paul Foulkes and Catherine Byrne published an article a couple of years ago in the International Journal of Speech, Language and the Law on changes in formant frequencies (and I think F0) brought about by the signal transmission properties of mobile telephone lines. - Philip Harrison's work on the comparability and relative (un)reliability of formant frequency measurements made using different software packages (Praat, WaveSurfer/xwaves+, Sensimetrics, SpeechStation, etc.) is possibly also relevant here. - Some discussion on cepstrum analysis can be found in a chapter by Liljenkrantz in The handbook of phonetic sciences (Blackwell), ed. by Hardcastle & Laver, and probably also in Acoustic phonetics, by Kenneth N. Stevens in MIT Press. - On acoustics in Spanish there are books and articles by, for example, Antonio Quilis or Borzone de Manrique. I'd also add Eugenio Martínez Celdrán. Once more, thanks very much to the five kind respondents for all the useful information and to the whole Linguist community. Best regards, Mario Cal Varela University of Santiago de Compostela Linguistic Field(s): Phonetics
Respond to list|Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|