LINGUIST List 15.2363
Tue Aug 24 2004
Sum: Speech Corpus for Neural Network Training
Editor for this issue: Megan Zdrojkowsky <meganlinguistlist.org>
Directory
Scott Drellishak, Speech Corpus for Neural Network Training
Message 1: Speech Corpus for Neural Network Training
Date: Mon, 23 Aug 2004 21:18:19 -0400 (EDT)
From: Scott Drellishak <sfdu.washington.edu>
Subject: Speech Corpus for Neural Network Training
A few weeks ago, I posted a request for information about speech
corpora of a particular kind to both the Linguist List and the
Corpora-List (Linguist 15.1895). This is the (somewhat belated) summary.
I described the corpora we are seeking as follows:
''We are looking for a corpus that contains samples of many speakers
producing many vowels (preferably in a less reduced register) that
also contains human-validated pitch and formant (F1, F2, and F3)
tracks and, if possible, bandwidth information. A corpus that
contains more than just vowels is fine, since we can discard sections
of the samples that do not suit our needs.''
I received five replies:
1) John Lawler suggested MICASE (Michigan Corpus of Academic
Spoken English), which is available here:
http://www.lsa.umich.edu/eli/micase/index.htm
2) Lesley Carmichael suggested I post my request to the
Corpora-List.
3) Jane Edwards pointed me at the Switchboard Transcription
Project:
http://www.icsi.berkeley.edu/real/stp/index.html
4) Susana Sotillo wrote, ''At a recent conference (CALICO) I
saw a demonstration of the Speechcalator (Allen Blackwell
and associates). Why don't you write him at Carnegie-
Mellon.''
5) Linda Bawcom offered an hour and a half of taped
conversation that she used in her MA research.
Many thanks to everyone who replied.
Scott Drellishak
University of Washington
Seattle, WA