LINGUIST List 5.256

Sun 06 Mar 1994

Review: The Oxford Acoustic Phonetic Database on Compact Disk

From: Ian MacKay <>
Subject: Review of The Oxford Acoustic Phonetic Database on Compact Disk

The Oxford Acoustic Phonetic Database on Compact Disk
By J. B. Pickering and B.S. Rosner
1993 Oxford University Press

Reviewed by Ian MacKay (

The title of this work, Acoustic Phonetic Database, might be
construed as suggesting that it consists of an attempt to
present exemplars of the range of sounds of human language.
However, the authors' goal was quite a different one. What
they have compiled is a database of words containing
monophthongs in 7 languages. Therefore, while the database
contains many consonants, its goal is to present native-
speaker exemplars of simple vowels in context. The database
will permit acoustic phonetic research on vowels from a
standard set of recordings, thereby dealing with questions as
to whether different researchers' conclusions result from
differences in technique or in the data studied; issues of
phonetic context (both segmental and suprasegmental),
register, dialect, talker age, talker sex, and talker size
(which correlates to vocal tract length) create such a
tapestry of variables that, particularly in the medium of
print, they cannot be satisfactorily dealt with. However, with
a standard reference set such as this, there now exists a
standard to which comparisons can be made, to say nothing of
the availability of a vast database for direct acoustic
descriptive work. The authors suggest that the databases will
also be useful to those engineers working on automatic speech
recognition, as well as to psychologists, in addition to the
obvious utility to linguists.

The publication consists of a phonetic database on 2 CD ROMs
and an accompanying manual. The collection and wide
dissemination of a database such as this is made possible by
CD ROM, which permits precise acoustic control and safeguards
the integrity of the data. Distribution by magnetic tape or
vinyl disk, a technique that has been employed in the past for
some phonetic and other demonstration material, has all of the
fidelity and signal-to-noise ratio problems inherent in
analogue materials, as well as problems of wow and the
precision of playback speed.

The 7 languages in fact means 8 databases, since both an
American and a British dialect of English are included. The
databases include 10 vowels in American English, 11 vowels in
British English, 10 vowels in French, 14 in German, 14 in
Hungarian, 7 in Italian, 10 in Japanese, and 5 in Spanish.
Some choices seem arbitrary: nasalized monophthongs in French
are excluded; [OU] and [Ei] in English are excluded, but [ij]
and [uw], which are typically diphthongal as well, are
included, presumably because they are closer to having a
monophthongal character. The dialects chosen are generally the
most prestigious or best-known: RP British, Castilian Spanish,
Northern German (Hochdeutsch), Northern Italian, Tokyo
Japanese, etc. The choice of languages was surely in part a
matter of practicality, but the attempt has been to include,
among IE languages, representatives of Germanic and Romance
languages, and two non-IE languages as well.

The authors rejected nonsense words in collecting their data.
They determined the phonotactically-permissible environments
for the target vowels in each language (typically: stressed VC
and CV or CVC, and unstressed VC and CV), and then sought
words that furnished that context for the target vowels. The
informants ("speakers") pronounced these words, and the
isolated vowels as well (in some languages, of course, the
isolated vowels are also lexical items). Taking into account
the variety of environments, the American English inventory
includes 694 words; 794 in RP; 566 in French; 740 in
Hochdeutsch; 957 in Budapest Hungarian; 442 in Italian; 479 in
Japanese; and 382 in Spanish. These figures give some
appreciation of the scope of the databases, which are truly

Similar care was given to the informant characteristics. Each
word list was produced by 8 talkers, 4 female, 4 male. They
were roughly matched for stature; exact heights are given.
They are also roughly matched for age in order to avoid
variability due to historical change in progress.

Details of the recording and digitizing process are provided.
The recordings have 12-bit depth and a 10-kHz digitization.

Most of the 200-page manual is given over to the listings of
words. For each language database, the vowels are listed by
vowel and context, by alphabetical order, and by the numerical
order of the test words.

The CD ROMs were designed for usage in a DOS environment in
conjunction with such an analysis program as CSRE (Canadian
Speech Research Environment). Usage with a Macintosh is less
transparent, and involves the use of FileConverter (still, the
CD ROMs mount on a Macintosh and show directory contents
straighforwardly). The Macintosh-converted files can then be
accessed by a waveform editor. The authors suggest the use of
Signalize; a description of Signalize was posted on LINGUIST
in February 1994.

This work represents an attempt to create an accessible
database collected under closely controlled conditions and
usable by those having access to what is now considered quite
pedestrian equipment, namely a PC with a CD ROM player. (One
improvement that could be made would have been the inclusion
of software that would permit playback without having
specialized analysis software.) The endproduct represents the
accomplishment of an impressive undertaking, and provides a
tool of considerable utility.

NOTE: The reviewer would like to apologize to editors and
subscribers of LINGUIST, as well as to the publisher and
authors, for the delay in posting this review. Obviously,
the advantage of an electronic forum such as LINGUIST is the
timely posting of material.
