LINGUIST List 15.2363

Tue Aug 24 2004

Sum: Speech Corpus for Neural Network Training

Editor for this issue: Megan Zdrojkowsky <meganlinguistlist.org>

Directory

Scott Drellishak, Speech Corpus for Neural Network Training

Message 1: Speech Corpus for Neural Network Training

Date: Mon, 23 Aug 2004 21:18:19 -0400 (EDT)
From: Scott Drellishak <sfdu.washington.edu>
Subject: Speech Corpus for Neural Network Training

A few weeks ago, I posted a request for information about speech corpora of a particular kind to both the Linguist List and the Corpora-List (Linguist 15.1895). This is the (somewhat belated) summary.

I described the corpora we are seeking as follows:

''We are looking for a corpus that contains samples of many speakers producing many vowels (preferably in a less reduced register) that also contains human-validated pitch and formant (F1, F2, and F3) tracks and, if possible, bandwidth information. A corpus that contains more than just vowels is fine, since we can discard sections of the samples that do not suit our needs.''

I received five replies:

1) John Lawler suggested MICASE (Michigan Corpus of Academic Spoken English), which is available here:

http://www.lsa.umich.edu/eli/micase/index.htm

2) Lesley Carmichael suggested I post my request to the Corpora-List.

3) Jane Edwards pointed me at the Switchboard Transcription Project:

http://www.icsi.berkeley.edu/real/stp/index.html

4) Susana Sotillo wrote, ''At a recent conference (CALICO) I saw a demonstration of the Speechcalator (Allen Blackwell and associates). Why don't you write him at Carnegie- Mellon.''

5) Linda Bawcom offered an hour and a half of taped conversation that she used in her MA research.

Many thanks to everyone who replied.

Scott Drellishak University of Washington Seattle, WA