Editor for this issue: Karen Milligan <karen
linguistlist.org>
> I question whether this is a more basic problem. Isn't it possible > and even likely that getting the words right would be made possible > more frequently with the ability to detect sentence boundaries and > emotional content more accurately? > Liz Coppock Yes and no. There is certainly evidence that prosodic features can help. But if you are working with a system on a task where the word error rate is 70%, prosodic features are not going to reduce it to 30%. - Richard Sproat http://www.research.att.com/~rws/Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
Re: Linguist 13.2046, Disc: Accuracy in Speech Recognition: Priorities > > Date: Wed, 7 Aug 2002 23:58:39 -0500 > From: Elizabeth Coppock <e-coppockMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuenorthwestern.edu> > Subject: Re: 13.2044, Disc: New: Accuracy in Speech Recognition: Priorities > > Richard Sproat wrote:> How about the more basic problem of getting most of the words right? > > I question whether this is a more basic problem. Isn't it possible > and even likely that getting the words right would be made possible > more frequently with the ability to detect sentence boundaries and > emotional content more accurately? Most people in the speech community are not working on multi-sentential utterances or emotion detection, though some are. For the most part, we are working on single utterances, which correspond (more or less) to sentences, phrases, or even single words, depending on the state of your application. I agree with Richard about mainly getting the words right, though I would modify that slightly. People like to talk about Word Error Rates, but I take those numbers with a grain of salt. While it's obviously good to get all the words right, you usually don't need to do that. You do need to get all of the semantic assignments/slots/frames/keys right. If my recognizer hears my "What is the temperature of the port engine?" as "What an temperature for port engine?" doesn't really matter so much as long as my parser can figure it out (which it can). And for most speech apps, for a given state you can write a grammar that looks for a word, phrase, sentence or even multiple sentences. If you "get the words right" (or nearly right) and you've written your grammar properly, sentence boundaries won't matter so much. Now the intel community may have a different perspective, of course.... :-) - Kurt Godden, Ph.D. Principal Member of the Engineering Staff Advanced Technology Labs Lockheed Martin Corporation
August 8, 2002 Re Linguist 13.2044 > The NYT article that Karen S. Chung pointed us to is a pretty good > example of the kind of reporting that anyone who works on speech > technology (or at least anyone who is honest) should cringe at. For me, the most important thing that could develop from speech recognition technology would be a method for identifying _hostility_ in spoken English in a fashion that would be objective and reliable. Standard practice in American English is for people to use hostile language with an associated hostile intonation -- and then deny their action by saying, "But all I _said_ was..." followed by the words they spoke, but with a nonhostile intonation. Because no mechanism for objective identification of hostility in spoken English exists at the moment, establishing verbal abuse as a criminal act is still impossible even when there is a taped record of the speech used and even when substantial harm has been done; abusive language can only be introduced in court within the context of some other "recognized" criminal act. If someone on the list is doing work on this topic that I'm unaware of, I'd like very much to know about it. Suzette Haden ElginMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue