Publishing Partner: Cambridge University Press CUP Extra Publisher Login

FYI: Teaching of lx list, New speech corpora


Author: Suzanne Fleischman

FYI Body: I would like to announce the formation of a list concerned with the
teaching of linguistics. This list, "teach-ling," is a forum for the
exchange of ideas, materials, solutions to common problems, syllabi,
activities, etc. A particular emphasis of this list will be
supporting the use of active learning methods in teaching. Members
are welcome to post and respond to queries about problems and concerns
in teaching their linguistics courses, setting up syllabi, and
selecting texts and other materials.

Teach-ling is an unmoderated open list. To subscribe, just send a
message

to listproc@lists.nyu.edu saying

subscribe teach-ling YOURADDRESS YOURFIRSTNAME YOURLASTNAME




The Center for Spoken Language Understanding at the Oregon
Graduate Institute of Science and Technology is releasing two
new telephone speech corpora: 22 Language and Alphadigit. As
always, the CSLU corpora are available at no cost to
universities and other not-for-profit organizations. Companies
may obtain speech corpora and other benefits through membership
in CSLU's industrial affiliates program (see

http://www.cse.ogi.edu/CSLU/memberships/memberships.html).

First Corpus:

Release 1.0 of the 22 Language Speech Corpus is a collection of
telephone quality speech from over 2000 speakers in 22 differen
languages. Collection, annotation and distribution of this
corpus is supported in part by a grant from NSF and DARPA. The
languages are: Arabic, Portuguese, Cantonese, Czech, English,
Farsi, French, German, Hindi, Hungarian, Indonesian, Italian,
Japanese, Korean, Mandarin, Polish, Russian, Spanish, Swahili,
Swedish, Tamil, and Vietnamese. The speech includes utterances
of varying lengths from three seconds to one minute long,
produced in response to prompts recorded in each language. Each
utterance in this corpus has been verified by two native
speakers, with differences among the transcribers resolved, to
determine (among other things) the gender, dialect, accent, and
responsiveness of the caller. In addition, callers in each
language were asked to speak in English for 20 seconds. The
current release contains speech from 100 callers in each
language. More information is available at:

http://www.cse.ogi.edu/CSLU/corpora/22lang/

Second Corpus:

The Alphadigit Corpus is a collection of about 78,000 examples
from 3,031 talkers saying 6 digit strings of letters and digits
over the telephone. A total of about 75 hours (2.3GB) of speech
are included in Release 1.0. Each file has an orthographic
transcription. More information is available at:

http://www.cse.ogi.edu/CSLU/corpora/alphadigit/

To order either of these corpora or any other CSLU corpora you
can fill out the online order form:

http://www.cse.ogi.edu/CSLU/corpora/orderform.html

Feel free to contact me if you have any questions or comments

- Mike Noel
noel@cse.ogi.edu


Back   FYI main page