Editor for this issue: Steve Moran <steve
linguistlist.org>
A couple of weeks ago I posted this query (Linguist 14.751): I have been approached by a local law firm for assistance in a court case, but do not have the expertise required. They came to me because I do conversation analysis, but this is really something different, although it does relate to recorded conversation. The situation is as follows: police witnesses are making claims about the identity of speakers for individual turns on an audio tape being used as evidence. The law firm feels that the assignment of speakers to turns is being done in an arbitrary fashion, and doubts its accuracy. It is, of course, crucial to the case to know who said what. They would like an expert witness who could say why the accuracy is questionable. While I know from personal experience that it can be difficult to identify the speaker of certain turns in multi-party conversation, they want actual scientific explanations for why impressionistic identity assignment might be a problem, and how one can accurately assign identity. (I have not yet listened to their tapes, but will have an opportunity to do so.) If anyone has any experience with such matters, or knows of any published material relating to it, or has any ideas about how to go about doing this, please contact me. If you think you might be able to help, but you're still not really sure what I'm asking, or need more detail, please contact me with clarification questions. *********************************************************** The responses I received fall into several categories. Some people offered their services, and I thank them. However, the law firm has no desire to pay for experts coming in from overseas. I have not included those offers in the summary. Michael Erard suggested posting to forensic-linguisticsMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuejiscmail.ac.uk, and Carsten Otto forwarded my post to that list. A large group of responses related to who to contact, and what web sites or journals to look at. A few of those contained specific references. Another group of responses discussed the issue, suggesting how to do voice recognition or outlining some of the issues involved in it. The messages in those two groups are given below, divided into the two categories. I wish to thank all the people who took the time to write those responses. For those who asked for an update about the case, here it is. I turned all the posts I received over to a former student, Bronwen Innes, whose PhD topic was in the area of language and the law, and who was happy to act as expert witness. She is following up various of the sources and references listed below. The law firm is providing her with extracts from the tape, so that she can comment on the quality of the recording and the difficulty of identifying speakers, both in general and on this tape. They do not want to give her the relevant sections of the tape, where their client is speaking. Apparently they don't want to take the chance of her saying that the police were accurate in this instance. She's hoping that perhaps after she testifies the judge will order the tapes to be given over to her for more expert analysis, but that may never happen. Responses relating to who to contact or where to look for help. You might try to get in touch with Diana Eades (applied linguistics at the University of Hawaii) She is an Australian who has done lots of work of the type you describe. Barbara Horvath **************************************************** I would suggest you try to contact Prof. Diana Eades, who is Australian and used to be at University of New England, Armidale, NSW 2551 with this e-mail: deades
metz.une.edu. However, for a while she was at the U of Hawaii at this e-mail: eades
hawaii.edu. I haven't been in direct touch with her for some time, so I can't tell you her present location ad affiliation. However, I would bet that any of the Australian linguists who do forensic linguistics, like John Gibbons, would know her whereabouts. The last e-mail I have for him is: john.gibbons
linguistics.usyd.edu.au. He may also be able to connect you with linguists in Australia (closer than Chicago!) who can help with this problem. Then there are two UK linguists, John Baldwin and Peter French, who wrote a whole book called "Forensic Phonetics" (1990: Pinter), who should be able to advise you. Judith Levi **************************************************** I can't answer your question directly, but I can give you some leads. I would go first to a forensic phonetics lab. I don't know who in New Zealand does that kind of work, but others would. You might contact Peter French in the UK at jpf
jpfrench.demon.co.uk. You may also want to contact Ton Broeders in the Netherlands at t.broeders
nfi.minjus.nl. Ton works for the government there. If you ask him if he knows of who is doing good work in this field in New Zealand for a client other than the gov't, he will tell you if he knows. Larry Solan **************************************************** Try this website: http://www.owlinvestigations.com It may be too late or too expensive, but this company does what you want. Pete Unseth **************************************************** Yes, it's known to be a very hard problem, and (not surprisingly, since the human auditory apparatus is good at normalizing away speaker variations) something which humans are not particularly good at - it's one task that computers are typically better than humans! I know that James and Janet Baker have done some consulting of this type. They can be reached at Janet_Baker
email.com Janet_Baker
email.com and jim
sandboxscribe.com jim
sandboxscribe.com - - Jonathan Young **************************************************** I know nothing definitive about it but we have all heard about "voice print" identification. I googled about 460 hits on it, saw some courses being taught and news articles about verifying the voice of osama on one of his tapes, but didn't quickly find anyone offering expertise as a service. This might be the direction to go in, however. William Bliss **************************************************** I would start with professional literature specifically on forensic analysis of wiretap data, before looking to the broader acoustics literature. There is a forensics professional organization in NZ, maybe they can help locate appropriate journals and experts: Australian and New Zealand Forensic Science Society (ANZFSS) http://www.nifs.com.au/ANZFSS/ANZFSS.html Steve Lowe **************************************************** I asked my wife, who did her MSc dissertation in Edinburgh on forensic linguistics. She writes: Try Peter French (forensic linguistics - can't remember where he is) or Hermann Kunzel at undeskriminalamt, Wiesbaden (he's done work on speaker identification using signal analysis and pattern recognition - though I only have a paper from 10 years ago). Or else try the Society of Forensic linguistics. Bernard Payne **************************************************** while I am not an expert in the field of forensic speaker identification, either, you might find something useful on the website of the "International Association for Forensic Phonetics": http://www.iafp.net/ The Journal "Forensic Linguistics" has some articles on the topic (http://www.builder.bham.ac.uk/forensiclinguistics/welcome.asp), e.g.: * Schiller, N. O. & K�ster, O. (1998). The ability of expert witnesses to identify voices: A comparison between trained and untrained listeners. Forensic Linguistics. The International Journal of Speech, Language and the Law, 5, 1-9. * K�nzel, H. J. (1994). On the problem of speaker identification by victims and witnesses. In: Forensic Linguistics 1, 45-59. For a more general, short intro to the subject of forensic linguistics, you might want to check http://www.csa.com/hottopics/linglaw/overview.html -- Caren Brinckmann ******************************************** Responses discussing issues in or how to do voice identification. I have faced your situation several times. I don't know the final answers but here's my thoughts anyway. An outsider can't know the names of the speakers. Best thing to do is to mark a transcript UM1, UM2, UF1, UF2, etc. (UM means unidentified male, etc) if there is adequate reason to suspect that the UMs and UFs are different. It's often possible to identify male from female voices, although not always. Children's voices are a serious problem of course. Even with this caution, you'll need to have some auditory or acoustic phonetic evidence to support your separation of UM1 from UM2, etc. Consistent differences in vowel production is one such way. Idiosyncratic word usage, grammatical structures, etc. are another. Speech affectations, such as lisps, larygealization, creaky voice, etc. help too. Of course, if the speakers happen to name each other at some point, you can justify using the assigned names as well. Warning, I had a case one time where three of the four speakers were named John. Sound spectography can help a lot here, if you have expertise in it and access to it. If not, it is not unusual to request someone to do it. Police transcripts are notoriously bad in such matters, often conveniently agreeing with the police's theory of the case. Fortunately, the person who makes the police transcript should be subject to the same questioning that you will face. You will need to call on your field's expertise to trump it when and if it is different from yours. Are you skilled in phonetics? Consistently different pitch, pace and intonation contours can help too, as can different accents, if any exist. Things get more problematic when the tape contains heightened emotions. Voices tend to get higher, blurring even female/male contrasts sometimes. Roger Shuy **************************************************** I would agree with the law firm that witnesses' judgments may not be accurate 100%. I think the best way to go is to use speech recognition technology to make objective and scientific judgments based on acoustic analysis of the voices of the participants in that conversation. Although the technology may not be able to identify idiosyncratic properties of everyone in the globe, I believe it can make accurate identification from a small number of voices which is the case in your situation. Ali Farghaly, Ph.D. **************************************************** I did my doctoral dissertation on turn-taking, within a conversation analysis framework. I defended it successfully this past December, in Mexico City. The hard job was transcribing the tapes of conversations, especially one where I had 12 people around the table, and initially they were not aware of the portable tape recorder. Even though the participants were all people I knew (they were part of the family, so I could recognize their voices), there were constant interruptions, and the sound was not always the best, since the conditions of recording were not optimal. There are many factors that could make it difficult to identify the speakers: number of speakers, interruptions, overlap of two or more conversations, speakers with similar voices, distortion of sound, noise, quality of tape (if it is non-professional, which I assume is the case). Audio-cassettes are not as reliable as mini-discs. I worked with transcribing both, and the difference in sound quality was significant. I recall one particular recording, on mini-disc. There were places where two conversations overlapped, but on the mini-disc I was able to "separate" the two conversations: to concentrate on one conversation and transcribe, then to rewind and concentrate on the other conversation and transcribe. But the same did not happen with audio-tapes, especially those recorded with portable tape-recorders. I don't know what kind of machine was used the for the recording, but I know for a fact that the kind of tape-recorder used affects the sound quality dramatically. A lot of my data was part of a project on studying the Spanish of Mexico City, and different recording machines were used: mini-disc, digital (with lapel microphones), and audio, with just the integrated microphone on the portable tape-recorder. The mini-disc was the best, and the audio the worst. Gina Musselman **************************************************** I noticed your query on the Linguist List regarding speaker identification. I'm not sure if this will help, but I have attached a paper that is in press at the Journal of Experimental Psychology: Human Perception & Performance (it should be out soon). The paper examines the phenomenon of change detection in the auditory domain. The paper describes a couple of experiments in which participants heard a list of words over a set of headphones. Halfway through the list a different voice began to present the words. Only about 40% of the participants noticed that the voice presenting the list changed to a different person...not anywhere near as accurate as a layperson might expect ("They're two different voices, how can you not tell the difference?"). Mike Vitevitch **************************************************** If I understand things correctly, the conversation involves a number of speakers, and the concern of the law firm is that utterances may be incorrectly attributed to the various speakers? If the actual number and identities of all the possible speakers is known, then the assignment of conversational turns to particular speakers can probably be done with a high level of success using expert auditory and acoustic analysis. In optimal circumstances, lay listeners may appear to be equally successful, but their success and reliability will be strongly influenced by factors such as: * familiarity with the speakers' voices before this conversation was recorded (high familiarity should mean the witnesses/lay listeners are fairly accurate in their attribution of turns to particular speakers; low familiarity with the voices makes them much less reliable witnesses and their testimony should be regarded as far less accurate than independent expert analysis)* listeners always operate with expectations about the structure of conversation, as I'm sure you know, and parties involved in a case are rarely able to separate their strong expectations about who will say what from what is objectively present in a conversation. The typical example is police officers transcribing conversations inaccurately because of their expectations of what will be said or who will say specific things. Of course, defence parties will commit similar mistakes. There are a number of books on forensic speech analysis that might be looked at, though the main point which you would quickly be able to refer to is just that listeners are often very unreliable in their identification of speakers, due to the multitude of influencing factors: expectation/memory/bias/available information/etc Rose, Philip (2002): Forensic Speaker Identification Hollien, Harry (2002): Forensic Voice Identification Hollien, Harry (1996?): The Acoustics of Crime There is basic information about forensic speech analysis on my website (see below); also at the website of Helen Fraser at University of New England (Armadale, Australia), which should be found by Google search if you enter 'Helen Fraser forensic phonetics' - Dr Duncan Markham http://www.interfaceanalysts.com/forensic.html **************************************************** I testified in a similar case in Liverpool, UK two years ago. The person accused by the police was "identified" by means of impressionistic criteria by the "president" of the English Society of Forensic Phoneticians. Acoustic analysis revealed that the person in question would have had to stretch his neck some 3 cm to produce signals that supposedly were his. He was acquitted. You can access my CV which demonstrates competence in this area including consulting for the FBI and the NTSB (on the Egytair crash - which involved voice identification). Philip Lieberman **************************************************** Oh for heaven's sake, doesn't anyone remember the scientific method? If you want to show that speaker identification is unreliable then_test_it_. Find some environment that is similar to the one where the conversations in question took place -- a restaurant, legal office, streetcorner, whatever. Bring a video camera and record some conversation. Ask some experts to try to distinguish the speakers by audio alone, using whatever methods they are using on the evidence tapes, then look at the video to see how they did. The recordings must be candid, if that is legally/ethically possible. 2) Try to get the same sound quality or better than in the evidence tapes. 3) Try to get the same _style_ or better. By good style I mean long sentences and people seldom talking at once. In other words, make the test as realistic as you can, and always give the opposition the benefit of the doubt so that they will have no excuse if their experts fail. Ben Thompson **************************************************** The controversy about forensic use of "voiceprints" has been around for at least 25 years. I have not followed the controversy lately, but in the 70's and 80's the legal resolution was complicated by the fact that the proponents of the methodology (mostly ex-police officers rather than speech scientists, but including some scientists and clinicians) had their own professional society. Although the vast majority of members of scientific bodies such as Acoustical Society of America thought the technique was not reliable under forensic conditions, very few members of ASA qualified as expert witnesses, because they were not trained in the particular techniques nor were they members of the particular professional society. The few who were (I believe Peter Ladefoged was in this group) took it on as almost a mission to try impose scientific standards and to testify against overuse of the methodology. Ben's suggestion of the use of the scientific method is excellent, and is certainly the right approach to the broader controversy. However, it is probably not a fit for this particular case for several reasons: 1) If the case is already at trial, there is probably not time. 2) Determining turn-taking is a very different, and much easier task than "voice identification", or even "voice verification." You are distinguishing between only two voices and you have known samples of each voice under the identical recording conditions (from unambiguous parts of the conversation). This is easier even than voice verification in which you are making a binary accept/reject decision of one person against the rest of the population. On the other hand, the task can still be arbitrarily hard and error prone under noisy recording conditions. I am fairly certain that a careful scientific study will show that it is very easy to tell apart at least some pairs of voices (under reasonable recording conditions). If this is true, then the average error rate for a random pair of voices will not be relevant for a trial in which there is a particular pair of voices in a particular recording condition. That is, it seems that no conclusion could be drawn except by studying the actual recordings. Assuming that New Zealand courts have a "reasonable doubt" criterion, one way to show that the turn taking discrimination is unreliable on this particular set of recordings is to have several experts independently label the turn taking. To show reasonable doubt, it would not be necessary to test the experts used by the other side, but merely to have several other experts each independently do the task. If the experts are not is substantial agreement, it would cast doubt on anyone's ability to do the task on this particular set of recordings. Thus the evidence would be doubt about these particular recordings, rather than trying to prove that discrimination of turn taking is impractical in general. Of course if the new experts all agree with the other side's experts, then you might have to accept that the turn taking determination is reliable on these recordings. Jim