LINGUIST List 15.2363

Tue Aug 24 2004

Sum: Speech Corpus for Neural Network Training

Editor for this issue: Megan Zdrojkowsky <>


  1. Scott Drellishak, Speech Corpus for Neural Network Training

Message 1: Speech Corpus for Neural Network Training

Date: Mon, 23 Aug 2004 21:18:19 -0400 (EDT)
From: Scott Drellishak <>
Subject: Speech Corpus for Neural Network Training

A few weeks ago, I posted a request for information about speech
corpora of a particular kind to both the Linguist List and the
Corpora-List (Linguist 15.1895). This is the (somewhat belated) summary.

I described the corpora we are seeking as follows:

''We are looking for a corpus that contains samples of many speakers
producing many vowels (preferably in a less reduced register) that
also contains human-validated pitch and formant (F1, F2, and F3)
tracks and, if possible, bandwidth information. A corpus that
contains more than just vowels is fine, since we can discard sections
of the samples that do not suit our needs.''

I received five replies:

1) John Lawler suggested MICASE (Michigan Corpus of Academic
 Spoken English), which is available here:

2) Lesley Carmichael suggested I post my request to the

3) Jane Edwards pointed me at the Switchboard Transcription

4) Susana Sotillo wrote, ''At a recent conference (CALICO) I
 saw a demonstration of the Speechcalator (Allen Blackwell
 and associates). Why don't you write him at Carnegie-

5) Linda Bawcom offered an hour and a half of taped
 conversation that she used in her MA research.

Many thanks to everyone who replied.

Scott Drellishak
University of Washington
Seattle, WA 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue