* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 17.2233

Thu Aug 03 2006

Sum: Sound-File Formats for Speech Recordings

Editor for this issue: Kevin Burrows <kevinlinguistlist.org>

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
        1.    Mario Cal-Varela, Sound-File Formats for Speech Recordings

Message 1: Sound-File Formats for Speech Recordings
Date: 01-Aug-2006
From: Mario Cal-Varela <iamariousc.es>
Subject: Sound-File Formats for Speech Recordings

Query for this summary posted in LINGUIST Issue: 17.2131

Regarding Query: http://linguistlist.org/issues/17/17-2131.html

Dear Linguists:

Last July 24 I posted a query to the list regarding the adequacy of
different file formats for computerized speech analysis. This was the
original text of the query:

I'd like to compare digital speech samples collected from different
sources, including online radio and samples digitized by myself from
analogical sources. I'm specially interested in fundamental frequency and
formant position, as well as time-related aspects of segments (specifically
VOT and vowel duration). My questions are the following:

What features of the speech signal (and in what ways) may be affected by
the format of speech samples (MP3, WAV, stream audio...)? Are the results
of spectrographic analysis of samples with different file formats and
qualities comparable? Is there any relevant bibliography available on this

First of all, thanks very much to those people who responded to my
questions and provided very useful and relevant suggestions:

James L. Fidelholtz, Benemérita Universidad Autónoma de Puebla, MÉXICO
Mark J. Jones, University of Cambridge
Dominic Watt, University of Aberdeen
Damien Hall, University of Pennsylvania
Heriberto Avelino, University of California at Berkeley

Here is a quick summary of their comments:

Although measurements of duration and time-related aspects of the signal do
not seem to be affected by file format, for formant and F0 analysis the
consensus is that, among the usual formats, only .WAV and .AIFF files are
safe bets. Compression algorithms used for MP3, MiniDisc and similar affect
the signal in many different ways and basically degrade it.

On the other hand, James Fidelholtz comments that, if properly processed,
even very noisy speech can yield to acoustic analysis. For example, he
suggests using cepstrum analysis to get the formants and F0, following
these steps:
1) get the signal digitalized (if it is analogic); or get the
digitalized signal, if available. (= S)
2) do a computerized spectrum of the signal. [Sp(S)]
3) do a cepstrum of Sp(S) (spectrum of the spectrum--this will give you
the fundamental frequency F0 for each discrete sampling point along the
spectrum over time)
4) have the computer consider *only* the points of Sp(S) which are
'near'integral multiples of F0, and plot the result. This will give you the
formants, even for extremely noisy speech.

The topic seems to recur on discussion lists, so several respondents
suggest using search terms such as MP3, ATRAC, FORMANT, etc. on Google or
on discussion list search engines, for example on PHONET
(http://www.jiscmail.ac.uk/cgi-bin/webadmin?S1=phonet). Mark Jones sends
the following, from Linguist:

The IEEE website is also mentioned by several respondents as a possible
source of further information (Institute of Electrical and Electronics
Engineers, Inc. http://www.ieee.org).

For an example of a major project where digitised speech was used, Damien
Hall mentions the Atlas of North American English, which incidentally used
only Wav files (more information at: http://www.mouton-online.com/anae.php).

As for bibliography on the topic, there were also a few suggestions::

- http://www.di.unipi.it/~lcioni/papers/2001/CompData.pdf.

- Paul Foulkes and Catherine Byrne published an article a couple of years
ago in the International Journal of Speech, Language and the Law on changes
in formant frequencies (and I think F0) brought about by the signal
transmission properties of mobile telephone lines.

- Philip Harrison's work on the comparability and relative (un)reliability
of formant frequency measurements made using different software packages
(Praat, WaveSurfer/xwaves+, Sensimetrics, SpeechStation, etc.) is possibly
also relevant here.

- Some discussion on cepstrum analysis can be found in a chapter by
Liljenkrantz in The handbook of phonetic sciences (Blackwell), ed. by
Hardcastle & Laver, and probably also in Acoustic phonetics, by Kenneth N.
Stevens in MIT Press.

- On acoustics in Spanish there are books and articles by, for example,
Antonio Quilis or Borzone de Manrique. I'd also add Eugenio Martínez Celdrán.

Once more, thanks very much to the five kind respondents for all the useful
information and to the whole Linguist community.

Best regards,
Mario Cal Varela
University of Santiago de Compostela

Linguistic Field(s): Phonetics
Respond to list|Read more issues|LINGUIST home page|Top of issue

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.