LINGUIST List 9.1502

Tue Oct 27 1998

FYI: European Lang. Resourses, Int'l Summer School

Editor for this issue: Karen Milligan <karenlinguistlist.org>


Directory

  1. Val\233rieMapelli, European Language Resourses Association News
  2. Ian Roberts, Thermi International Summer School in Linguistics

Message 1: European Language Resourses Association News

Date: Mon, 26 Oct 1998 09:04:54 +0100
From: Val\233rieMapelli <mapellielda.fr>
Subject: European Language Resourses Association News



___________________________________________________________
				ELRA
		European Language Resources Association
			 ELRA News 
___________________________________________________________


		 *** ELRA NEW RESOURCES ***


We are happy to announce new speech resources available via ELRA:

1) ELRA-S0052 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus
- DB1
2) ELRA-S0053 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus
- DB2
3) ELRA-S0054 Chilean Spanish FDB-250
4) ELRA-S0055 Russian SpeechDat-like FDB-1000
5) ELRA-S0056 Slovenian SpeechDat(II) FDB-1000
6) ELRA-S0057 Shanghai Mandarin FDB-1000
7) ELRA-S0058 RVG1 (Regional Variants of German 1, Part 1)


Below a description of each resource:


1) ELRA-S0052 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus
 DB1 Phonetically rich sentences & application oriented utterances

The Italian Fixed Network Speech Corpus version 1.0 was recorded within the
scope of the SpeechDat(M) project (LRE-63314), funded by the European
Commission. Recording was done by using a primary rate ISDN interface,
yielding 8 kHz, 8 bits per sample, A-law coded signal. The data files are
formatted according to the SAM European project. The speech data are
compressed with the GNU gzip program. All software needed to use the corpus
is provided on the CDs.

The corpus contains the speech of about 1000 speakers (about 500 male and
500 female) and was designed to support the creation of voice-driven
teleservices. The callers spoke at least 39 items, comprising:
	isolated and connected digits,
	natural numbers,
	money amounts,
	spelled words,
	time and date phrases,
	yes/no questions,
	city names,
	common application words,
	application words in phrases,
	phonetically rich sentences.
Most items are read, some are spontaneously spoken.

The recordings come with extensive and standardised documentation. All
speech is carefully transcribed at the orthographic level; in addition, a
number of clearly audible non-speech events are included in the
transcription. Moreover, age and regional background of the speakers are
provided. A pronunciation dictionary is added, containing all words that
occur in the corpus, with a corresponding SAMPA broad-class phonemic
transcription.

Validation and premastering of the CD-ROMs were performed by the Speech
Processing Expertise Centre (SPEX), Leidschendam, The Netherlands.

Price for ELRA members:
	for research use: 11000 ECU
	for commercial use: 14000 ECU

Price for non members:
	for research use: 20000 ECU
	for commercial use: 20000 ECU
____________________________________________

2) ELRA-S0053 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus
 DB2 Phonetically rich sentences sub-set

See ELRA-S0052 for description. DB2 is a sub-set of DB1; it contains only
the phonetically rich sentences items.

Price for ELRA members:
	for research use: 8,800 ECU
	for commercial use: 14,000 ECU

Price for non members:
	for research use: 14,000 ECU
	for commercial use: 20,000 ECU
____________________________________________

3) ELRA-S0054 Chilean Spanish FDB-250

This speech database gathers Spanish data as spoken in Chile. All
participants are native speakers. The corpus consists of read speech,
including digits and application words for teleservices, recorded through
an ISDN card. The whole database consists of 6.45 hours of speech, with 24
utterances per speaker. There is a total of 250 speakers (68 male, 80
female, 102 untagged). Except for the 102 untagged speakers, the age class
is divided as follows: 15 speakers are less than 16 year old, 72 speakers
are between age 16 to 30, 44 speakers are between age 31 to 45, and 14
speakers are between age 46 to 60 (and 102 untagged).

The callers spoke 74 different items in total:
	isolated digits,
	yes/no,
	common application words.

The data is provided with orthographic transliteration for all 6,000
utterances including 4 categories of non-speech acoustic events. A phonetic
lexicon with canonical transcription in SAMPA is also included.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples.
Data are stored in a SAM file format.

Price for ELRA members: 5,000 ECU
Price for non members: 7,500 ECU
____________________________________________

4) ELRA-S0055 Russian SpeechDat-like FDB-1000

This speech database gathers Russian data. The corpus consists of read and
spontaneous speech, recorded through an ISDN card, and was validated and
accepted according to the SpeechDat(II) database exchange format. The whole
database consists of 72 hours of speech, with approx. 49 prompted
utterances per speaker. A total of 1000 speakers was recorded (500 male,
500 female). These are native speakers from 5 regions, mainly from Moscow
and St. Petersburg (803 speakers). The speakers age class is divided as
follows: 16 speakers are less than 16 year old, 340 speakers are between
age 16 to 30, 345 speakers are between age 31 to 45, 255 speakers are
between age 46 to 60, and 44 speakers are above age 60.

The callers spoke the following items:
	isolated and connected digits,
	natural numbers,
	money amounts,
	spelled words,
	time and date phrases,
	yes/no,
	city names,
	common application words,
	application words in phrases,
	phonetically rich sentences.

The data is provided with orthographic transliteration for all 48,812
utterances including 4 categories of non-speech acoustic events. A phonetic
lexicon with canonical pronunciation is also provided.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples. The
data is stored in a SAM file format (4 CD-ROMs).

Price for ELRA members: 14,000 ECU
Price for non members: 20,000 ECU
____________________________________________

5) ELRA-S0056 Slovenian SpeechDat(II) FDB-1000

The Slovenian SpeechDat(II) FDB-1000 consists of read and spontaneous
speech, recorded through an ISDN card, and was validated and accepted
according to the SpeechDat(II) database exchange format. The corpus
includes about 1000 speakers (about 500 male and 500 female) who called
over the Slovenian fixed network. All are native speakers of Slovenian from
all dialect regions of Slovenia.

The callers spoke the following items:
	isolated and connected digits,
	natural numbers,
	money amounts,
	spelled words,
	time and date phrases,
	yes/no,
	city names,
	common application words,
	application words in phrases,
	phonetically rich sentences.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples. The
data is stored in a SAM file format (CD-ROMs). A phonetic lexicon with
canonical transcriptions in SAMPA is also provided.

Price for ELRA members: 14,000 ECU
Price for non members: 20,000 ECU
____________________________________________

6) ELRA-S0057 Shanghai Mandarin FDB-1000

This acoustic database gathers Mandarin data, as spoken in Shanghai as a
first or second Chinese dialect/language. The corpus consists of read
speech, including digits and application words for teleservices, recorded
through an ISDN card. A total of 70 utterances was prompted by each
speaker. About 1000 speakers were recorded (500 male, 500 female).

The callers spoke the following items:
	isolated digits,
	yes/no,
	city names,
	common application words and phrases.

The data is provided with Chinese characters and English translation,
canonical Pinyin transcription including tone markers, and several
categories of non-speech events.

The speech files are stored as sequences of 8 bits 8 kHz A-law samples.
Signal and annotation files are stored separately.

Price for ELRA members: 10,000 ECU
Price for non members: 15,000 ECU
____________________________________________

7) ELRA-S0058 RVG1 (Regional Variants of German 1, Part 1)

The corpus consists of single digits, connected digits, phone numbers,
phonetically balanced sentences, computer command phrases and spontaneous
speech. Each speaker has read a subcorpus of 85 items:
	11 single digits (0-9, with the two pronunciations of 2 (zwei,
zwo)),
	19 connected digits (10-19, 20-100 in steps of ten),
	12 computer command phrases,
	30 phonetically balanced sentences,
	5 6-digit phone numbers,
	5 7-digit phone numbers,
	2 phone numbers with area code,
	1 minute spontaneous speech (monologue).

The speaker was placed in front of a standard IBM-compatible PC. The
backround noise was limited to the usual noise in office environment, eg.
door slam, backround crosstalk, phone ringing, paper rustle, PC noise, etc.
The head of the speaker is in a range between 2-4 feet to the screen, 1-2
feet from the desktop microphones. The speaker is not forced into a special
position. The speaker is wearing a Sennheiser HD 410 and is free to use the
keyboard or the mouse in front of him. The three desktop microphones are:
Sennheiser MD 441 U, Telex (Soundblaster) and Talk Back (AT&T). Speakers
were selected to achieve the demoscopic density of the German spoken areas
in Europe (including Austria and Switzerland).

The recorded sound samples are stored in NIST SPHERE format. The resolution
is 16 Bits. The sampling frequency is 22.050 Hz except for speakers 001 to
036 which were recorded with 11.025 Hz. Each microphone channel is stored
into a separate file. A transliteration of spontaneous speech according to
Verbmobil Format is also provided.

RVG1, Part 1 contains 197 speakers recorded through 2 microphones.
(RVG1, Part 2, with 303 speakers recorded through 2 microphones will be
available from the beginning of 1999.)

Price for ELRA members:
	for research use: 4,949 ECU
	for commercial use: 8,198 ECU

Price for non members:
	for research use: 5,838 ECU
	for commercial use: 9,898 ECU

=====================================
For further information, please contact :

 ELRA/ELDA		Tel : +33 01 43 13 33 33
 55-57 rue Brillat-Savarin	Fax : +33 01 43 13 33 30
 F-75013 Paris, France	E-mail : mapellielda.fr

or visit our Web site:

 http://www.icp.grenet.fr/ELRA/home.html
=====================================






Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Thermi International Summer School in Linguistics

Date: Tue, 27 Oct 1998 08:56:29 -0800
From: Ian Roberts <ian.robertspo.uni-stuttgart.de>
Subject: Thermi International Summer School in Linguistics


*************************************************************************


1999 THERMI INTERNATIONAL SUMMER SCHOOL IN LINGUISTICS

5-30 July 1999

Outline

The overall goal of the Thermi International Summer School in
Linguistics is to promote current research in the major areas of
theoretical linguistics and make it available to young researchers from
around the world. To that end, a number of advanced courses on selected
topics are organised under the auspices of GLOW. Previous GLOW summer
schools were held in Salzburg, Austria, in 1979, 1982 and 1985, in
Girona, Catalonia/Spain, in 1990, 1992, 1994 and 1996, and in Olomouc,
Czech Republic, in 1994, 1995, 1996 and 1997.

Organisation and venue

The 1999 Summer School is organised by GLOW and the University of
Patras. It will be held at a conference centre in Thermi, a small
village on the coast 12km/7 miles from Mytilene, Island of Lesbos,
Greece. For further information about Lesbos, see
http://www.travel-greece.com/aegean/lesvos.html and
http://www.lesvos.compulink.gr.
The organising committee is:

Ian Roberts (Stuttgart)
Angeliki Ralli (Patras)
Marina Nespor (Ferrara)
Henk van Riemsdijk (Tilburg)

Workshop

There will be a workshop on Tense and Aspect, organised by Tim Stowell
and Sabine Iatridou, on July 17th-18th. Full details and a call for
papers will be published in the Spring 1999 GLOW Newsletter.

Course fees

Course fees are as follows:
A.	Full enrolment (all courses for four weeks): US$400
B.	Partial enrolment (two weeks only): US$250
Please note that there is an additional general registration fee of
US$35.

Grants

Grants are available to partially cover the cost of travel and
accommodation. All grants include a fee waiver. We expect to offer 10
grants for students who are citizens of European Union countries
(whatever their country of residence), funded by the European
Commission, and 10 grants for non-EU citizens, funded by GLOW. To apply
for a grant, please write, enclosing a CV and a statement of your
research interests, to:

Ian Roberts
Institut fr Linguistik/Anglistik 
Universitt Stuttgart
Keplerstrae 17
D-70174 Stuttgart
Germany

or e-mail:

ian.robertspo.uni-stuttgart.de

by December 31st, 1998. Successful applicants will be notified by March
31st, 1999.

Program

4 week courses (July 5th-30th)

Ian Roberts (Stuttgart)		"Introduction to Syntax"
Sabine Iatridou (MIT)		"Introduction to Semantics"
Giuseppe Longobardi (Trieste) 	"The Syntax and Semantics of DPs"
Harry van der Hulst & Nancy Ritter (Leiden)	"The Syntax of Segments:
 A Head-driven Approach to Phonotactics"


2 week courses: July 5th-16th 

Marina Nespor (Ferrara)		"Phrasal Phonology"
Diamandis Gafos (New York)	"Topics in Prosodic Morphology"
Edwin Williams (Princeton)	"Morphosyntax"
Henk van Riemsdijk (Tilburg)	"Late lexical insertion"
Anna-Maria di Sciullo (UQAM)	"Local asymmetries"
Mara-Luisa Rivero (Ottawa)	"Balkan and Slavic Comparative Syntax"

2 week courses: July 19th-30th 

van der Hulst??
Angela Ralli (Patras)		"Topics in Comparative Morphology and Dialectal
 Variation"

Anna Roussou (Cyprus)		"Minimalism and Empty Categories"

Andrea Moro (San Raffaele, Milan)"Dynamic Asymmetry"

Tim Stowell (UCLA)		"The Syntax of Quantification"

Maria-Teresa Guasti (Siena)	"Developmental Psycholinguistics
 within the Principles & Parameters Model"

- ------------------------------------- 
- ------------------------------------------
Registration form

Please complete this form and return it to the address below (by fax or
e-mail to Tsokoglou, by snail-mail to Ralli).

Registration

Title________________ First name ______________ Last name
_______________________

Affiliation
___________________________________________________________________

Address
_____________________________________________________________________

	_____________________________________________________________________


Tel ____________________ Fax _____________________ E-mail
______________________

Please indicate how long you would like to register for: four weeks 
two weeks.

Housing: please tick this box if you would information about housing in
Thermi .

Payment: The General Registration Fee is US$35/GBP20/DM50/EU30, payable
in any of these currencies. Please make sure that payment is accompanied
by a clear indication of the participant's name. Bear in mind that all
charges made by banks in effecting payment must be covered by the
participant. Bank Transfer to Ergobank S.S.A., Kavetsou 16, 81100
Mytilene, Greece in the name of Ralli Angela, account number
119/01832-00010/39. Tel: +30-251-47692. Fax: +30-251-47696. Swiftcode:
ERGOGRAA.
A copy of the transfer should be sent as proof of payment.


Secretariat: 
Angeliki Tsokoglou: angelath.forthnet.gr. Tel/Fax : 30-1-8659749.
Snail-mail correspondence should be addressed to:
Angela Ralli, Linguistics Section, Dept of Philology (for Thermi Summer
School), University of Patras, 26500 Rio, Patras, Greece. Fax:
+30-61-996-195.
e-mail: aralliatlas.uoa.gr
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue