LINGUIST List 5.543

Thu 12 May 1994

Qs: Voice Variability in Speech Synthesis

Editor for this issue: <>


  1. Wolfgang Hess, Voice Variability in Speech Synthesis

Message 1: Voice Variability in Speech Synthesis

Date: Wed, 11 May 1994 23:15:41 Voice Variability in Speech Synthesis
From: Wolfgang Hess <>
Subject: Voice Variability in Speech Synthesis

Dear Colleagues:

At the last COCOSDA meeting in September 1993 Berlin (during
EUROSPEECH-93) the committee for speech synthesis decided to form
a small working group to gather information about ongoing research
on speaker and voice variability in speech synthesis. I was
asked to organize this group (together with Kim Silverman
and Nick Campbell) and to initiate it.
 Well, as almost usual, after the conferences university terms start
and with them the work load. Hence I was not active in this matter
up to now. However, I want to give some report at the next COCOSDA
meeting which is to take place in September during ICSLP-94 and/or
the IEEE/ESCA workshop on speech synthesis just before ICSLP.
I am thus very grateful for any information which can be provided
to me in this matter. I will specify it a little more closely
in the following.

Recent applications in speech synthesis deal not only with
implementing TTS systems and improving their quality on all levels,
but also with the question how to bring something of the variety
of human voices into the synthesis. A number of typical questions
or problems in this domain are

-- How can we synthesize emotional speech (happy, angry, sad, etc.)?
 In which way must the synthetic voice be varied (prosody,
 voice quality), and what can be achieved with the different
 approaches [concatenative synthesis, parametric synthesis by rule
 with a source-filter model (e.g., formant synthesis), articulatory

-- How can we synthesize a variety of voices with the same system?
 How can we, for instance, transform a male synthetic voice into a
 female one and vice versa? How can we interpolate between several
 voices (which may be particularly difficult in concatenative
 synthesis which is based on elements of natural utterances)?
 What are the specific problems in this respect?

-- How can we synthesize a variety of speaking styles (casual, clear,
 formal)? Which reductions, elisions etc. increase the naturalness
 of a "neutral" synthesis system (e.g., a reading machine) and should
 thus be incorporated, and which ones shouldn't because they
 are not appropriate?

-- How can we adapt a synthetic voice to a given natural one (not
 only with respect to the sex of the natural speaker, but also
 to fundamental frequency range, spectral properties etc.) when
 given the task that - for reasons whatsoever - the synthetic
 voice shall sound as similar to the natural target voice as

As I said, these are only a number of questions which are to make
it a little bit clearer to you what kind of information we want
to receive. This list is thus far from being complete.

In order to make it easy to you (and hence hopefully
increase the number of responses) I do not circulate a
long questionnaire, but I only want to get an answer from you
to a few questions when you or some colleague(s) at your institution
are active in this domain. Please indicate the type of work done
(will be kept confidential if desired) and results achieved
so far. If you have publications on this subject, please indicate
the references (not necessarily in English!). It will be most
important to us to collect references on this subject.
 To make things more convenient, you may use the following
preformatted mailer to me. As I distribute this letter over
several (moderated and unmoderated) mailing lists, please use
this mailer. PLEASE DO NOT RESPOND TO THE LIST (this might be flooded
otherwise, making some people very upset at me!).
 If you are not yourself active in this domain, but know people
that are, please forward them this mail. I apologize to anybody
who might receive this letter more than once via different

--- Start of Mailer ---------------------------------------------------

mail -s "Voice Variability in Speech Synthesis"

Your name, institution, address (including fax and e'mail) ...

Active in which area of speech (processing and) synthesis? ...

Which principle of speech synthesis do you apply ...
- concatenative synthesis with parts of natural utterances
 using PSOLA or some parametric representation
- synthesis by rule using a parametric representation
 (formant synthesis, LPC, ...)
- synthesis by an articulatory model

What kind of system do you use ...
- text-to-speech
- dialog system (semantic representation to speech or similar)
- other application

Which language(s) is your system able to synthesize?

Which specific research of yours is particularly related to voice and
speaker variability as indicated above? Which questions are
covered at your lab?

MOST IMPORTANT: If you have relevant publications, please give me a
list of references ...

--- End of Mailer -----------------------------------------------------

As time is running, I would appreciate to receive this information
as soon as possible, but, PLEASE, BEFORE JUNE 15, 1994. I will
then compile a list of information and a small bibliography and
distribute it to those who respond to this mail.

Thank you in advance for your kind cooperation.
Sincerely yours,
Wolfgang Hess
