LINGUIST List 8.567

Tue Apr 22 1997

Sum: Whispering and singing in tone lgs

Editor for this issue: Ljuba Veselinova <>


  1. Susan Fischer, Whispering and singing in tone languages

Message 1: Whispering and singing in tone languages

Date: Tue, 22 Apr 1997 12:46:28 +0900
From: Susan Fischer <>
Subject: Whispering and singing in tone languages

A few weeks ago I posted the following query:

> This question was the result of a late-night idle conversation, and,
> frankly, little of scholarly worth is likely to come out of it, but
> here goes: in a tone language, and here I'm thinking particularly of
> Chinese, is there some kind of compensatory process for conveying
> tones when whispering (or singing), such as substituting stress for a
> higher tone, or does context alone disambiguate potential homonyms?
> Please respond to me privately and I will summarize for the list given
> sufficient responses.

I would like to thank the following people for their thoughtful responses:
Scott McGinnis <>
Kormi Anipa <>
Cathryn Donahue <>
Ralf Grosserhode <>
Gerald B Mathias <>
jonathan glassow <>
Geoffrey Sampson <>
Jerry Packard <>
 Chilin Shih <>
 Dan Moonhawk Alford <>

Several interesting threads emerge from the responses, extracts
of which are given below each numbered paragraph; unless
otherwise specified, remarks are about Chinese:

1) For some but apparently not all tone languages, even if tones
are neutralized via whispering, other phonetic correlates such
as amplitude, length, and laryngeal activity remain;

"I myself am presently working on a (retricted) tone language
with a tendency to devoice final vowels. While there is
sometimes a little doubt whether a vowel is really totally
devoiced, the tone seems always clear. Talking with some
friends here and trying a bit of voiceless singing, I came to
the following vague ideas: The larynx moves up and down whether
you sing with or without voice. One can hear that, I think,
maybe because of the echo form the pharynx. Also, it might have
an effect on the rest of the articulary system. Like certain
sounds influence pitch (depressor consonants, back vowels
vs. High front vowels), tones might influence sounds. Actually,
if I try to pronounce a closed high vowel /i/ with a very low
pitch, I can't help but centralize it. So, tone might leave even
more traces in ordinary segments. Also, even without producing
voice, the glosstis is more or less open and more or less
tensed, which is audible."

"pitch is not the only phonetic manifestation of phonological
tone in Mandarin Chinese (and I expect other dialects are
similar). In ordinary spoken Chinese, rises and falls in pitch
are mirrored rather precisely by increases and decreases in
loudness, and a variable equivalent to loudness could be
preserved in whispering, I would think (airflow rate?)."

"when tones are whispered, amplitude variation generally takes
the place of F0 variation (higher amplitude corresponding to
higher F0). as to the independent question of whether context is
enough, generally the more good context you have, the less you
need the tones; but by the same token the clearer the tone
information, the less you need context. "

"My hunch is that context would prove enough to disambiguate
most of the time. Beyond that, there are certain phonetic
realia connected to the tonal contours themselves that would
help. T3 (falling-rising) segmentsare generally of the longest
duration of any of the tones (I'm talking about Mandarin here --
I don't know much about the situation in other
languages/dialects), while T4 (sharp falling) are the shortest

"In short, whispered tones follow contuors and duration in the
same way that spoken tonal speech follows contuors and

2) In Chinese, at least, voicing activity is generally not
totally absent in whispering; thus, one can often "hear" tone as
pitch changes.

"In practice, most people's whispering speech is only partially
breathy, with some part of the vocal cord closed/vibrating,
therefore there are still F0 information present. I did an
experiment on whispering tones and had a hard time getting
myself to talk totally void of F0. For those samples that are
indeed voiceless--I understand that various people use diff
strategies--what I did was to raise or lower my glottis to mimic
the tone shape, higher glottis position for higher f0. The
result on the spectrogram is a very high formant typically
around 6000 or 7000 Hz fitting the desired tone shape."

3) In some classical Chinese songs as well as in Vietnamese
songs, you have to have a match between tone changes in the
lyrics and pitch changes in the music.

 " Hi. I have had a reasonable exposure to the Cantonese
pop music from Hong Kong and it's amazing how able they are to
get the lexical tones to fit the desired melody. However, if
there is ever a discrepancy, in my experience it seems that
context disambiguates and secondary devices such as you
mentioned are not implemented to do the job. "

"With singing it seems to me that if we are talking about
singing traditional Chinese poetry, the issue would not arise
because Chinese metre is based on tones so that the music and
the words would not be fighting each other, as it were. "

"Modern musicans has commented on tone and melody
matching. Y.R. Chao must have said something, and one musician
Pao Chen Lee has written a second year chinese textbook with a
chapter one tone and melody matching, saying that you have big
problems if the two doesn't match, and "xiang1 si1" "miss each
other" is likely to be interpreted as "xiang3 si3" "want to die"
if the melody goes down rather than stays high."

4) In the case of singing, sometimes there is no way to
disambiguate. However, the load carried by tone may not be that
great anyway.

"For singing, words in Chinese songs are notoriously hard to
understand, and people don't mind that much--they get the
printed words when they want to sing karaoke."

In modern speech the information load of tones is indeed low. Richard
[Sproat] had an algorithm converting toneless pinyin back to characters,
and got 95% correct on newspaper style of writing. Colloquial speech should
score even higher. Don't even imagine doing that for classical Chinese.

5) In some cases, only context provides disambiguation.

 " I don't know anything about Chinese, but my mother
tongue is a tone language: Ewe (in Ghana, Togo and Benin, in
West Africa). As a matter offact, the question of ambiguity
seems to be part and parcel of tone languages not only in such
instances as whispering and singing, but also in cases where
there is much noise in the background where the communication is
taking place and in circumstances where the speaker loses
his/her voice (or the natural tonal characteristics of it) as a
result of illness (flu, etc.). [snipped an amazing number of
ambiguities in Ewe]. My answer to your query is that as far as
Ewe (a typical tone language) is concerned, speakers seem to
rely solely on context in such circumstances as whispering,
singing, etc."

6) This is a problem not only for tone languages but for those,
like Japanese, that have pitch accent.

"I remember wondering about Japanese many times over the years,
but if I ever figured it out, I have forgotten. I know when I
whisper the words for "bridge" and "chopsticks" I imagine I hear
a difference, but if I really do, I can't guess what it would

7) Several articles and a couple of LSA presentations have dealt
with these topics. I'm afraid that I haven't followed up on
these leads, but they are included below for those who wish to
pursue the topic further.

 " Dr. Marjorie Chan has done some work on how Cantonese tones line up
 (or don't!) with Cantonese songs. You can contact her at" (another reader says that she presented at the 1990
LSA meeting)

"My source for the detailed phonetic correlates of tone is
instrumental phonetic work by one of the people who taught me
Chinese, Paul Kratochvi'l, some of which is published and some
not. For instance, he demonstrated to my satisfaction at least
that the answer to the longstanding puzzle about whether a
phonological 3rd tone before another 3rd tone becomes quite
identical to a 2nd tone is that it does become similar in pitch
contour but retains its opposite loudness contour."

"One poet in the 13 or 14 century, Jiang1 Kui2, is known in the
literary circle for his expertise on musical theory. While all
other contempory poets fit their poems to exixting melodies,
this person is know to write his own music. His poems and melody
are translated into modern musical notation by Y.R. Chao's
daughter Rulan Pian."

8) Other distortions to the signal can cause problems:

"As a tangent to your question, the Cheyenne language has the
most complex set of rules leading to VOWEL DEVOICING of perhaps
any known language. (e.g., 'my spine' = nAhtAhtoonO). I once
quipped that it must have been the Cheyennes who created Plains
Indian Sign Language because you can't shout this stuff across a
gully. The real answer is, however, and pertinent to your
compensation question, all of the whispered vowels re-voice in
(extremely rare) shouted form."

Again, thanks to all those who responded to my query.

Susan Fischer

Susan Fischer
Deparment of Linguistics
Faculty of Arts and Letters

Tohoku University
Here until August,
Kawauchi Campus

Sendai 980

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue