LINGUIST List 15.1484

Tue May 11 2004

Qs: English Speech Corpora

Editor for this issue: Steve Moran <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at


  1. Ingo Plag, speech corpora

Message 1: speech corpora

Date: Mon, 10 May 2004 12:09:19 -0700
From: Ingo Plag <>
Subject: speech corpora

Dear Linguist Listers,

I have two queries concerning English speech corpora.

1. I am looking for a speech corpus (language: English) that is
part-of- speech tagged and has soundfiles, transcriptions and
part-of-speech tags aligned. Furthermore, it needs to be of
considerable size (> 100,000 word tokens, if possible). Can anyone
point me towards pertinent corpora?

So far I only found one corpus that meets all the criteria mentioned
above, the Boston University Radio News Corpus.

2. In spite of hour-long efforts and the help of experienced
colleagues I have not managed to open the example files of the BU
Radio News Corpus properly, no matter whether I used PRAAT,
Wavesurfer, or Transcriber. All three programs can open the sound file
(.sph) without problems but neither of the programs can access the
files with the transcription or the part-of- speech tags and align
this information with the sound wave. Can anyone help? Which
program(s) can do the job?

Any help will be greatly appreciated.

Many thanks in advance!

Best regards,
Ingo Plag

Prof. Dr. Ingo Plag
English Linguistics
Fachbereich 3
Universitaet-Gesamthochschule Siegen
Adolf-Reichwein-Str. 2
D-57068 Siegen
tel. 0271-740-2560
tel. 0271-740-2349 (secretary)
fax 0271-740-3246
tel.: 06422-2817 (home)

office: room AR-K 103
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue