LINGUIST List 8.78

Wed Jan 22 1997

FYI: Teaching of lx list, New speech corpora

Editor for this issue: Ljuba Veselinova <>


  1. Suzanne Fleischman, teaching of linguistics list (fwd)
  2. Terri Lander, two new speech corpora from CSLU

Message 1: teaching of linguistics list (fwd)

Date: Thu, 16 Jan 1997 20:51:18 -0800
From: Suzanne Fleischman <>
Subject: teaching of linguistics list (fwd)

I would like to announce the formation of a list concerned with the
teaching of linguistics. This list, "teach-ling," is a forum for the
exchange of ideas, materials, solutions to common problems, syllabi,
activities, etc. A particular emphasis of this list will be
supporting the use of active learning methods in teaching. Members
are welcome to post and respond to queries about problems and concerns
in teaching their linguistics courses, setting up syllabi, and
selecting texts and other materials.

Teach-ling is an unmoderated open list. To subscribe, just send a

to saying

Message 2: two new speech corpora from CSLU

Date: Thu, 16 Jan 97 10:30 PST
From: Terri Lander <>
Subject: two new speech corpora from CSLU

The Center for Spoken Language Understanding at the Oregon
Graduate Institute of Science and Technology is releasing two
new telephone speech corpora: 22 Language and Alphadigit. As
always, the CSLU corpora are available at no cost to
universities and other not-for-profit organizations. Companies
may obtain speech corpora and other benefits through membership
in CSLU's industrial affiliates program (see

First Corpus:

Release 1.0 of the 22 Language Speech Corpus is a collection of
telephone quality speech from over 2000 speakers in 22 different
languages. Collection, annotation and distribution of this
corpus is supported in part by a grant from NSF and DARPA. The
languages are: Arabic, Portuguese, Cantonese, Czech, English,
Farsi, French, German, Hindi, Hungarian, Indonesian, Italian,
Japanese, Korean, Mandarin, Polish, Russian, Spanish, Swahili,
Swedish, Tamil, and Vietnamese. The speech includes utterances
of varying lengths from three seconds to one minute long,
produced in response to prompts recorded in each language. Each
utterance in this corpus has been verified by two native
speakers, with differences among the transcribers resolved, to
determine (among other things) the gender, dialect, accent, and
responsiveness of the caller. In addition, callers in each
language were asked to speak in English for 20 seconds. The
current release contains speech from 100 callers in each
language. More information is available at:

Second Corpus:

The Alphadigit Corpus is a collection of about 78,000 examples
from 3,031 talkers saying 6 digit strings of letters and digits
over the telephone. A total of about 75 hours (2.3GB) of speech
are included in Release 1.0. Each file has an orthographic
transcription. More information is available at:

To order either of these corpora or any other CSLU corpora you
can fill out the online order form:

Feel free to contact me if you have any questions or comments

- Mike Noel
