Editor for this issue: Ann Dizdar <ann
linguistlist.org>
A number of weeks ago I asked very informally for people's reactions to synthetic speech (also prerecorded speech) and for studies on emotional reactions to synthetic speech. I wish to thank those who responded: Osamu Fujimura Margaret Jackman Randall A. Major Corey Miller Johanna Rubba Stephen P. Spackman I had hoped for more responses, but I have started to collect information from friends and colleagues as well. I've realized that I need a more structured way of gathering info - with the possiblitiy that this (up until now) rather informal approach to the matter may suddenly turn into a more formal study. The respondents react both positively and negatively to synthetic speech; one may be irritated at the "bluntness" of the machine, the lack of flexibility in the programs, etc. but still find the synthetic vocal information handy. >From Per Egil Heggtveit at Telenor, Norway, I have received a list of references on synthetic speech, but none of the stuides cover emotional reactions. Osamu Fujimura wrote: >I suggest that you ask the question to Marian Macchi. I did. She responded the following: >Two of the US telephone companies have >introduced a service called "Reverse Directory Assistance", which >is available to telephone customers. This is a telephone service whereby >a customer calls a special number, enters a telephone number using >the touchtone pad, and hears the name and address of the person to >whom that telephone number is listed. A speech synthesizer (Orator, >a text-to-speech synthesizer that we have developed here at Bellcore) >is used to speak the name and address. >Before the introduction of this automated service, one of the >telephone companies offered the service with real human operators. >Today the complaint rate from customers is no higher than it was >when the service was offered with real operators. > >This is not to say that use of synthetic speech is always acceptable. >In fact, many applications for synthetic speech are not adopted >becasue the speech sounds too robotic. Margaret Jackman wrote: >My experience with synthetic vocies is with our telephone information >system. It asks what is the name and address of the person for whom >we want the phone number. I am always annoyed since I know I will >usually have to repeat it to a real person later. > >I am also annoyed with voice mail systems that go on forever - giving >me 10 different options, instead of the voice operator who puts me >through to the person I want. > >I suppose the problem isn't the synthetic language - it is generally >very clear and concise. The problem is that when I get one it >generally wastes my time, and for that reason, I have a negative >reaction to them.. Randall A. Major wrote: >I'm not sure if they've worked on reactions or not, but you should try >contacting Barbara Grosz at >groszMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueeecs.harvard.edu >They've done a lot of work on synthetic speech and she may be able to >help you. Good luck! I contacted Barbara Grosz, who wrote: >sorry, but I have not done any experiments of this sort, though I have >done some work on speech synthesis. My colleague, Julia Hirschberg, >at AT&T research may know of some research in this arena, though >I don't believe she has done any either. I haven't contacted Julia Hirschberg yet, but I intend to. Corey Miller wrote: >You may want to look at an article on the perception of synthetic >speech by David Pisoni, in Progress in Speech Synthesis, >van Santen, Sproat, Olive and Hirschberg, Springer, 1997. I've tried to get a copy of the article through our university library, but the book is too recent, and I was told no copies are available yet. Johanna Rubba wrote: >My personal reaction to a synthetic voice on the phone is negative. I >experience >offense (because the company involved does not care enough to have a >real person staffing the phone line; they'd rather downsize and replace >people with machines); irritation (because I am not going to be able to >get any questions answered, and am going to be obliged to follow the >inflexible program set down by the corporation [and these are inevitably >not well-desgined, they waste the customer's time]. I also experience >irritation because synthetic voices do not sound like real voices, >meaning I have to put forth extra effort to parse their output, and also >because I am a perfectionist and don't understand why even relatively >simple things like normal list intonation (not the weird system used on >the [non-synthetic, just pre-recorded] directory assistance systems) >can't be gotten right. > >I know enough about computational linguistics to know that achieving >real-sounding synthetic speech is extremely difficulty, esp. if context >has to be taken into account. Is this an excuse for ugly synthetic >speech? Only if you think we really need synthetic speech. Do we? > >Oh, it's not all negative -- I do experience a low level of curiosity and >amusement in hearing how much of the sound of real speech the designers >have managed to capture in the artificial speech, and the particular >distortions that are found in synthetic speech (my intro ling students >love it when I mimic synthetic speech for them and point out things like >stress and intonation. I think some progress has been made in this area, >but they sure do recognize that flat, syllable-timed, nasal voice!) > >I just thought of a good use of synthetic speech that I do like. My word >processor has an auditory editor that reads my texts back to me. Though >the speech has some flaws, it's not too terribly bad, and it is a very >useful function when the eyes are no longer capable of seeing the errors. >Note that I like this because it's not an interaction; I get to choose >when I use it, and I don't expect to have a conversation with it. and finally, Stephen S. Spackman wrote: >Myself, I *like* machines. I use bank machines instead of live tellers >whenever practical. But (and this doesn't all bear directly on your >query, but maybe I'm talking to someone who wants to listen...!): > >(1) No deception. A machine should announce itself as such - ideally by >going "boing" or something before it starts to talk. It's extremely >annoying to find yourself trying to talk *with* a machine thinking it is >human. When you find out otherwise you feel both stupid and annoyed at >your wasted effort. Even answering machine messages have this problem. > >(2) Machines are not excused from clearing their throats and saying >hello. Again, "boing" will do and may even be preferable to "ahem" as >just mentioned. But I once nearly died of fright when a computer behind >me in a darkened room in a deserted bulding at 3am suddenly said "your >printer is out of paper." in an extremely calm, pleasant voice but with >inadequate warning. > >(3) Machines are not excused from boundary markers. One of the things I >*loathe* about automated directory assistance systems and talking clocks >is that they use the SAME recorded digits in all positions. This makes >it extremely hard to copy numbers down and know that you have them >right, as well as being simply annoying. Even just having separate >final/nonfinal digits would be an improvement. This is actually *less* >of a problem with synthesised speech, partly because synthesis systems >are more likely to do contour, and partly because they sound uniformly >bad rather than atrociously edited! > >(4) Machines are not excused from rephrasing. A computer reading phone >numbers should say, "seven two _six_, one _three_ zero _three_", but if >asked to repeat itself should use "seven twenty-six, thirteen oh three". > >(5) Speech *recognition* systems, at present, fail *consistently* for >some speakers. The statistics on successfully completed transactions may >be looking great, while some customers are effectively faced with >termination of service! > >What's specifically wrong with synthetic speech? Total absence of >pragmatic markers at every level, poor pitch contours, lack of >interactive adaptation with interlocutor at every level, poorly modelled >interaction between adjacent segments (which decreases noise immunity >rather than increasing it, no matter what one's engineering intuitions >might say :-). Thanks again to all respondents! Bente ######################################################################### Bente Henrikka Moxness Research Assistant Dept. of Linguistics NTNU (Norwegian University of Science and Technology) 7055 Dragvoll Norway Tel: +47 73 59 15 16 Fax: +47 73 59 61 19 e-mail: benmox
alfa.itea.ntnu.no #########################################################################