LINGUIST List 8.78

Wed Jan 22 1997

FYI: Teaching of lx list, New speech corpora

Editor for this issue: Ljuba Veselinova <ljubalinguistlist.org>

Directory

Suzanne Fleischman, teaching of linguistics list (fwd)

Terri Lander, two new speech corpora from CSLU

Message 1: teaching of linguistics list (fwd)

Date: Thu, 16 Jan 1997 20:51:18 -0800
From: Suzanne Fleischman <suzannegarnet.berkeley.edu>
Subject: teaching of linguistics list (fwd)

I would like to announce the formation of a list concerned with the teaching of linguistics. This list, "teach-ling," is a forum for the exchange of ideas, materials, solutions to common problems, syllabi, activities, etc. A particular emphasis of this list will be supporting the use of active learning methods in teaching. Members are welcome to post and respond to queries about problems and concerns in teaching their linguistics courses, setting up syllabi, and selecting texts and other materials.

Teach-ling is an unmoderated open list. To subscribe, just send a message

to listproclists.nyu.edu saying

subscribe teach-ling YOURADDRESS YOURFIRSTNAME YOURLASTNAME

Message 2: two new speech corpora from CSLU

Date: Thu, 16 Jan 97 10:30 PST
From: Terri Lander <tlandercse.ogi.edu>
Subject: two new speech corpora from CSLU

The Center for Spoken Language Understanding at the Oregon Graduate Institute of Science and Technology is releasing two new telephone speech corpora: 22 Language and Alphadigit. As always, the CSLU corpora are available at no cost to universities and other not-for-profit organizations. Companies may obtain speech corpora and other benefits through membership in CSLU's industrial affiliates program (see

http://www.cse.ogi.edu/CSLU/memberships/memberships.html).

First Corpus:

Release 1.0 of the 22 Language Speech Corpus is a collection of telephone quality speech from over 2000 speakers in 22 different languages. Collection, annotation and distribution of this corpus is supported in part by a grant from NSF and DARPA. The languages are: Arabic, Portuguese, Cantonese, Czech, English, Farsi, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Mandarin, Polish, Russian, Spanish, Swahili, Swedish, Tamil, and Vietnamese. The speech includes utterances of varying lengths from three seconds to one minute long, produced in response to prompts recorded in each language. Each utterance in this corpus has been verified by two native speakers, with differences among the transcribers resolved, to determine (among other things) the gender, dialect, accent, and responsiveness of the caller. In addition, callers in each language were asked to speak in English for 20 seconds. The current release contains speech from 100 callers in each language. More information is available at:

http://www.cse.ogi.edu/CSLU/corpora/22lang/

Second Corpus:

The Alphadigit Corpus is a collection of about 78,000 examples from 3,031 talkers saying 6 digit strings of letters and digits over the telephone. A total of about 75 hours (2.3GB) of speech are included in Release 1.0. Each file has an orthographic transcription. More information is available at:

http://www.cse.ogi.edu/CSLU/corpora/alphadigit/

To order either of these corpora or any other CSLU corpora you can fill out the online order form:

http://www.cse.ogi.edu/CSLU/corpora/orderform.html

Feel free to contact me if you have any questions or comments

- Mike Noel noelcse.ogi.edu