LINGUIST List 8.824

Tue Jun 3 1997

Sum: Reactions to synthetic speech

Editor for this issue: Ann Dizdar <annlinguistlist.org>


Directory

  1. Bente Henrikka Moxness, Sum: Reactions to synthetic speech

Message 1: Sum: Reactions to synthetic speech

Date: Mon, 2 Jun 1997 14:00:20 +0200 (MET DST)
From: Bente Henrikka Moxness <benmoxalfa.itea.ntnu.no>
Subject: Sum: Reactions to synthetic speech

A number of weeks ago I asked very informally for people's reactions
to synthetic speech (also prerecorded speech) and for studies on
emotional reactions to synthetic speech. I wish to thank those who
responded:

Osamu Fujimura
Margaret Jackman
Randall A. Major
Corey Miller
Johanna Rubba
Stephen P. Spackman

I had hoped for more responses, but I have started to collect
information from friends and colleagues as well. I've realized that I
need a more structured way of gathering info - with the possiblitiy
that this (up until now) rather informal approach to the matter may
suddenly turn into a more formal study. The respondents react both
positively and negatively to synthetic speech; one may be irritated at
the "bluntness" of the machine, the lack of flexibility in the
programs, etc. but still find the synthetic vocal information handy.
>From Per Egil Heggtveit at Telenor, Norway, I have received a list of
references on synthetic speech, but none of the stuides cover
emotional reactions.

Osamu Fujimura wrote:

>I suggest that you ask the question to Marian Macchi.

I did. She responded the following:

>Two of the US telephone companies have
>introduced a service called "Reverse Directory Assistance", which
>is available to telephone customers. This is a telephone service whereby
>a customer calls a special number, enters a telephone number using
>the touchtone pad, and hears the name and address of the person to
>whom that telephone number is listed. A speech synthesizer (Orator,
>a text-to-speech synthesizer that we have developed here at Bellcore)
>is used to speak the name and address.
>Before the introduction of this automated service, one of the
>telephone companies offered the service with real human operators.
>Today the complaint rate from customers is no higher than it was
>when the service was offered with real operators.
>
>This is not to say that use of synthetic speech is always acceptable.
>In fact, many applications for synthetic speech are not adopted
>becasue the speech sounds too robotic.

Margaret Jackman wrote:

>My experience with synthetic vocies is with our telephone information
>system. It asks what is the name and address of the person for whom
>we want the phone number. I am always annoyed since I know I will
>usually have to repeat it to a real person later.
>
>I am also annoyed with voice mail systems that go on forever - giving
>me 10 different options, instead of the voice operator who puts me
>through to the person I want.
>
>I suppose the problem isn't the synthetic language - it is generally
>very clear and concise. The problem is that when I get one it
>generally wastes my time, and for that reason, I have a negative
>reaction to them..

Randall A. Major wrote:

>I'm not sure if they've worked on reactions or not, but you should try
>contacting Barbara Grosz at
>groszeecs.harvard.edu
>They've done a lot of work on synthetic speech and she may be able to
>help you. Good luck!

I contacted Barbara Grosz, who wrote:

>sorry, but I have not done any experiments of this sort, though I have
>done some work on speech synthesis. My colleague, Julia Hirschberg,
>at AT&T research may know of some research in this arena, though
>I don't believe she has done any either.

I haven't contacted Julia Hirschberg yet, but I intend to.

Corey Miller wrote:

>You may want to look at an article on the perception of synthetic
>speech by David Pisoni, in Progress in Speech Synthesis,
>van Santen, Sproat, Olive and Hirschberg, Springer, 1997.

I've tried to get a copy of the article through our university
library, but the book is too recent, and I was told no copies are
available yet.

Johanna Rubba wrote:

>My personal reaction to a synthetic voice on the phone is negative. I
>experience
>offense (because the company involved does not care enough to have a
>real person staffing the phone line; they'd rather downsize and replace
>people with machines); irritation (because I am not going to be able to
>get any questions answered, and am going to be obliged to follow the
>inflexible program set down by the corporation [and these are inevitably
>not well-desgined, they waste the customer's time]. I also experience
>irritation because synthetic voices do not sound like real voices,
>meaning I have to put forth extra effort to parse their output, and also
>because I am a perfectionist and don't understand why even relatively
>simple things like normal list intonation (not the weird system used on
>the [non-synthetic, just pre-recorded] directory assistance systems)
>can't be gotten right.
>
>I know enough about computational linguistics to know that achieving
>real-sounding synthetic speech is extremely difficulty, esp. if context
>has to be taken into account. Is this an excuse for ugly synthetic
>speech? Only if you think we really need synthetic speech. Do we?
>
>Oh, it's not all negative -- I do experience a low level of curiosity and
>amusement in hearing how much of the sound of real speech the designers
>have managed to capture in the artificial speech, and the particular
>distortions that are found in synthetic speech (my intro ling students
>love it when I mimic synthetic speech for them and point out things like
>stress and intonation. I think some progress has been made in this area,
>but they sure do recognize that flat, syllable-timed, nasal voice!)
>
>I just thought of a good use of synthetic speech that I do like. My word
>processor has an auditory editor that reads my texts back to me. Though
>the speech has some flaws, it's not too terribly bad, and it is a very
>useful function when the eyes are no longer capable of seeing the errors.
>Note that I like this because it's not an interaction; I get to choose
>when I use it, and I don't expect to have a conversation with it.

and finally,
Stephen S. Spackman wrote:

>Myself, I *like* machines. I use bank machines instead of live tellers
>whenever practical. But (and this doesn't all bear directly on your
>query, but maybe I'm talking to someone who wants to listen...!):
>
>(1) No deception. A machine should announce itself as such - ideally by
>going "boing" or something before it starts to talk. It's extremely
>annoying to find yourself trying to talk *with* a machine thinking it is
>human. When you find out otherwise you feel both stupid and annoyed at
>your wasted effort. Even answering machine messages have this problem.
>
>(2) Machines are not excused from clearing their throats and saying
>hello. Again, "boing" will do and may even be preferable to "ahem" as
>just mentioned. But I once nearly died of fright when a computer behind
>me in a darkened room in a deserted bulding at 3am suddenly said "your
>printer is out of paper." in an extremely calm, pleasant voice but with
>inadequate warning.
>
>(3) Machines are not excused from boundary markers. One of the things I
>*loathe* about automated directory assistance systems and talking clocks
>is that they use the SAME recorded digits in all positions. This makes
>it extremely hard to copy numbers down and know that you have them
>right, as well as being simply annoying. Even just having separate
>final/nonfinal digits would be an improvement. This is actually *less*
>of a problem with synthesised speech, partly because synthesis systems
>are more likely to do contour, and partly because they sound uniformly
>bad rather than atrociously edited!
>
>(4) Machines are not excused from rephrasing. A computer reading phone
>numbers should say, "seven two _six_, one _three_ zero _three_", but if
>asked to repeat itself should use "seven twenty-six, thirteen oh three".
>
>(5) Speech *recognition* systems, at present, fail *consistently* for
>some speakers. The statistics on successfully completed transactions may
>be looking great, while some customers are effectively faced with
>termination of service!
>
>What's specifically wrong with synthetic speech? Total absence of
>pragmatic markers at every level, poor pitch contours, lack of
>interactive adaptation with interlocutor at every level, poorly modelled
>interaction between adjacent segments (which decreases noise immunity
>rather than increasing it, no matter what one's engineering intuitions
>might say :-).

Thanks again to all respondents!

Bente

#########################################################################
Bente Henrikka Moxness
Research Assistant
Dept. of Linguistics
NTNU (Norwegian University of Science and Technology)
7055 Dragvoll
Norway
Tel: +47 73 59 15 16
Fax: +47 73 59 61 19
e-mail: benmoxalfa.itea.ntnu.no
#########################################################################
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue