Editor for this issue: Karen Milligan <karen
linguistlist.org>
___________________________________________________________ ELRA European Language Resources Association ELRA News ___________________________________________________________ *** ELRA NEW RESOURCES *** We are happy to announce new speech resources available via ELRA: 1) ELRA-S0052 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus - DB1 2) ELRA-S0053 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus - DB2 3) ELRA-S0054 Chilean Spanish FDB-250 4) ELRA-S0055 Russian SpeechDat-like FDB-1000 5) ELRA-S0056 Slovenian SpeechDat(II) FDB-1000 6) ELRA-S0057 Shanghai Mandarin FDB-1000 7) ELRA-S0058 RVG1 (Regional Variants of German 1, Part 1) Below a description of each resource: 1) ELRA-S0052 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus DB1 Phonetically rich sentences & application oriented utterances The Italian Fixed Network Speech Corpus version 1.0 was recorded within the scope of the SpeechDat(M) project (LRE-63314), funded by the European Commission. Recording was done by using a primary rate ISDN interface, yielding 8 kHz, 8 bits per sample, A-law coded signal. The data files are formatted according to the SAM European project. The speech data are compressed with the GNU gzip program. All software needed to use the corpus is provided on the CDs. The corpus contains the speech of about 1000 speakers (about 500 male and 500 female) and was designed to support the creation of voice-driven teleservices. The callers spoke at least 39 items, comprising: isolated and connected digits, natural numbers, money amounts, spelled words, time and date phrases, yes/no questions, city names, common application words, application words in phrases, phonetically rich sentences. Most items are read, some are spontaneously spoken. The recordings come with extensive and standardised documentation. All speech is carefully transcribed at the orthographic level; in addition, a number of clearly audible non-speech events are included in the transcription. Moreover, age and regional background of the speakers are provided. A pronunciation dictionary is added, containing all words that occur in the corpus, with a corresponding SAMPA broad-class phonemic transcription. Validation and premastering of the CD-ROMs were performed by the Speech Processing Expertise Centre (SPEX), Leidschendam, The Netherlands. Price for ELRA members: for research use: 11000 ECU for commercial use: 14000 ECU Price for non members: for research use: 20000 ECU for commercial use: 20000 ECU ____________________________________________ 2) ELRA-S0053 FIXED0IT - Italian Fixed Network Speech (SpeechDat(M)) Corpus DB2 Phonetically rich sentences sub-set See ELRA-S0052 for description. DB2 is a sub-set of DB1; it contains only the phonetically rich sentences items. Price for ELRA members: for research use: 8,800 ECU for commercial use: 14,000 ECU Price for non members: for research use: 14,000 ECU for commercial use: 20,000 ECU ____________________________________________ 3) ELRA-S0054 Chilean Spanish FDB-250 This speech database gathers Spanish data as spoken in Chile. All participants are native speakers. The corpus consists of read speech, including digits and application words for teleservices, recorded through an ISDN card. The whole database consists of 6.45 hours of speech, with 24 utterances per speaker. There is a total of 250 speakers (68 male, 80 female, 102 untagged). Except for the 102 untagged speakers, the age class is divided as follows: 15 speakers are less than 16 year old, 72 speakers are between age 16 to 30, 44 speakers are between age 31 to 45, and 14 speakers are between age 46 to 60 (and 102 untagged). The callers spoke 74 different items in total: isolated digits, yes/no, common application words. The data is provided with orthographic transliteration for all 6,000 utterances including 4 categories of non-speech acoustic events. A phonetic lexicon with canonical transcription in SAMPA is also included. The speech files are stored as sequences of 8 bits 8 kHz A-law samples. Data are stored in a SAM file format. Price for ELRA members: 5,000 ECU Price for non members: 7,500 ECU ____________________________________________ 4) ELRA-S0055 Russian SpeechDat-like FDB-1000 This speech database gathers Russian data. The corpus consists of read and spontaneous speech, recorded through an ISDN card, and was validated and accepted according to the SpeechDat(II) database exchange format. The whole database consists of 72 hours of speech, with approx. 49 prompted utterances per speaker. A total of 1000 speakers was recorded (500 male, 500 female). These are native speakers from 5 regions, mainly from Moscow and St. Petersburg (803 speakers). The speakers age class is divided as follows: 16 speakers are less than 16 year old, 340 speakers are between age 16 to 30, 345 speakers are between age 31 to 45, 255 speakers are between age 46 to 60, and 44 speakers are above age 60. The callers spoke the following items: isolated and connected digits, natural numbers, money amounts, spelled words, time and date phrases, yes/no, city names, common application words, application words in phrases, phonetically rich sentences. The data is provided with orthographic transliteration for all 48,812 utterances including 4 categories of non-speech acoustic events. A phonetic lexicon with canonical pronunciation is also provided. The speech files are stored as sequences of 8 bits 8 kHz A-law samples. The data is stored in a SAM file format (4 CD-ROMs). Price for ELRA members: 14,000 ECU Price for non members: 20,000 ECU ____________________________________________ 5) ELRA-S0056 Slovenian SpeechDat(II) FDB-1000 The Slovenian SpeechDat(II) FDB-1000 consists of read and spontaneous speech, recorded through an ISDN card, and was validated and accepted according to the SpeechDat(II) database exchange format. The corpus includes about 1000 speakers (about 500 male and 500 female) who called over the Slovenian fixed network. All are native speakers of Slovenian from all dialect regions of Slovenia. The callers spoke the following items: isolated and connected digits, natural numbers, money amounts, spelled words, time and date phrases, yes/no, city names, common application words, application words in phrases, phonetically rich sentences. The speech files are stored as sequences of 8 bits 8 kHz A-law samples. The data is stored in a SAM file format (CD-ROMs). A phonetic lexicon with canonical transcriptions in SAMPA is also provided. Price for ELRA members: 14,000 ECU Price for non members: 20,000 ECU ____________________________________________ 6) ELRA-S0057 Shanghai Mandarin FDB-1000 This acoustic database gathers Mandarin data, as spoken in Shanghai as a first or second Chinese dialect/language. The corpus consists of read speech, including digits and application words for teleservices, recorded through an ISDN card. A total of 70 utterances was prompted by each speaker. About 1000 speakers were recorded (500 male, 500 female). The callers spoke the following items: isolated digits, yes/no, city names, common application words and phrases. The data is provided with Chinese characters and English translation, canonical Pinyin transcription including tone markers, and several categories of non-speech events. The speech files are stored as sequences of 8 bits 8 kHz A-law samples. Signal and annotation files are stored separately. Price for ELRA members: 10,000 ECU Price for non members: 15,000 ECU ____________________________________________ 7) ELRA-S0058 RVG1 (Regional Variants of German 1, Part 1) The corpus consists of single digits, connected digits, phone numbers, phonetically balanced sentences, computer command phrases and spontaneous speech. Each speaker has read a subcorpus of 85 items: 11 single digits (0-9, with the two pronunciations of 2 (zwei, zwo)), 19 connected digits (10-19, 20-100 in steps of ten), 12 computer command phrases, 30 phonetically balanced sentences, 5 6-digit phone numbers, 5 7-digit phone numbers, 2 phone numbers with area code, 1 minute spontaneous speech (monologue). The speaker was placed in front of a standard IBM-compatible PC. The backround noise was limited to the usual noise in office environment, eg. door slam, backround crosstalk, phone ringing, paper rustle, PC noise, etc. The head of the speaker is in a range between 2-4 feet to the screen, 1-2 feet from the desktop microphones. The speaker is not forced into a special position. The speaker is wearing a Sennheiser HD 410 and is free to use the keyboard or the mouse in front of him. The three desktop microphones are: Sennheiser MD 441 U, Telex (Soundblaster) and Talk Back (AT&T). Speakers were selected to achieve the demoscopic density of the German spoken areas in Europe (including Austria and Switzerland). The recorded sound samples are stored in NIST SPHERE format. The resolution is 16 Bits. The sampling frequency is 22.050 Hz except for speakers 001 to 036 which were recorded with 11.025 Hz. Each microphone channel is stored into a separate file. A transliteration of spontaneous speech according to Verbmobil Format is also provided. RVG1, Part 1 contains 197 speakers recorded through 2 microphones. (RVG1, Part 2, with 303 speakers recorded through 2 microphones will be available from the beginning of 1999.) Price for ELRA members: for research use: 4,949 ECU for commercial use: 8,198 ECU Price for non members: for research use: 5,838 ECU for commercial use: 9,898 ECU ===================================== For further information, please contact : ELRA/ELDA Tel : +33 01 43 13 33 33 55-57 rue Brillat-Savarin Fax : +33 01 43 13 33 30 F-75013 Paris, France E-mail : mapelliMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueelda.fr or visit our Web site: http://www.icp.grenet.fr/ELRA/home.html =====================================
************************************************************************* 1999 THERMI INTERNATIONAL SUMMER SCHOOL IN LINGUISTICS 5-30 July 1999 Outline The overall goal of the Thermi International Summer School in Linguistics is to promote current research in the major areas of theoretical linguistics and make it available to young researchers from around the world. To that end, a number of advanced courses on selected topics are organised under the auspices of GLOW. Previous GLOW summer schools were held in Salzburg, Austria, in 1979, 1982 and 1985, in Girona, Catalonia/Spain, in 1990, 1992, 1994 and 1996, and in Olomouc, Czech Republic, in 1994, 1995, 1996 and 1997. Organisation and venue The 1999 Summer School is organised by GLOW and the University of Patras. It will be held at a conference centre in Thermi, a small village on the coast 12km/7 miles from Mytilene, Island of Lesbos, Greece. For further information about Lesbos, see http://www.travel-greece.com/aegean/lesvos.html and http://www.lesvos.compulink.gr. The organising committee is: Ian Roberts (Stuttgart) Angeliki Ralli (Patras) Marina Nespor (Ferrara) Henk van Riemsdijk (Tilburg) Workshop There will be a workshop on Tense and Aspect, organised by Tim Stowell and Sabine Iatridou, on July 17th-18th. Full details and a call for papers will be published in the Spring 1999 GLOW Newsletter. Course fees Course fees are as follows: A. Full enrolment (all courses for four weeks): US$400 B. Partial enrolment (two weeks only): US$250 Please note that there is an additional general registration fee of US$35. Grants Grants are available to partially cover the cost of travel and accommodation. All grants include a fee waiver. We expect to offer 10 grants for students who are citizens of European Union countries (whatever their country of residence), funded by the European Commission, and 10 grants for non-EU citizens, funded by GLOW. To apply for a grant, please write, enclosing a CV and a statement of your research interests, to: Ian Roberts Institut fr Linguistik/Anglistik Universitt Stuttgart Keplerstrae 17 D-70174 Stuttgart Germany or e-mail: ian.robertsMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuepo.uni-stuttgart.de by December 31st, 1998. Successful applicants will be notified by March 31st, 1999. Program 4 week courses (July 5th-30th) Ian Roberts (Stuttgart) "Introduction to Syntax" Sabine Iatridou (MIT) "Introduction to Semantics" Giuseppe Longobardi (Trieste) "The Syntax and Semantics of DPs" Harry van der Hulst & Nancy Ritter (Leiden) "The Syntax of Segments: A Head-driven Approach to Phonotactics" 2 week courses: July 5th-16th Marina Nespor (Ferrara) "Phrasal Phonology" Diamandis Gafos (New York) "Topics in Prosodic Morphology" Edwin Williams (Princeton) "Morphosyntax" Henk van Riemsdijk (Tilburg) "Late lexical insertion" Anna-Maria di Sciullo (UQAM) "Local asymmetries" Mara-Luisa Rivero (Ottawa) "Balkan and Slavic Comparative Syntax" 2 week courses: July 19th-30th van der Hulst?? Angela Ralli (Patras) "Topics in Comparative Morphology and Dialectal Variation" Anna Roussou (Cyprus) "Minimalism and Empty Categories" Andrea Moro (San Raffaele, Milan)"Dynamic Asymmetry" Tim Stowell (UCLA) "The Syntax of Quantification" Maria-Teresa Guasti (Siena) "Developmental Psycholinguistics within the Principles & Parameters Model" - ------------------------------------- - ------------------------------------------ Registration form Please complete this form and return it to the address below (by fax or e-mail to Tsokoglou, by snail-mail to Ralli). Registration Title________________ First name ______________ Last name _______________________ Affiliation ___________________________________________________________________ Address _____________________________________________________________________ _____________________________________________________________________ Tel ____________________ Fax _____________________ E-mail ______________________ Please indicate how long you would like to register for: four weeks two weeks. Housing: please tick this box if you would information about housing in Thermi . Payment: The General Registration Fee is US$35/GBP20/DM50/EU30, payable in any of these currencies. Please make sure that payment is accompanied by a clear indication of the participant's name. Bear in mind that all charges made by banks in effecting payment must be covered by the participant. Bank Transfer to Ergobank S.S.A., Kavetsou 16, 81100 Mytilene, Greece in the name of Ralli Angela, account number 119/01832-00010/39. Tel: +30-251-47692. Fax: +30-251-47696. Swiftcode: ERGOGRAA. A copy of the transfer should be sent as proof of payment. Secretariat: Angeliki Tsokoglou: angel
ath.forthnet.gr. Tel/Fax : 30-1-8659749. Snail-mail correspondence should be addressed to: Angela Ralli, Linguistics Section, Dept of Philology (for Thermi Summer School), University of Patras, 26500 Rio, Patras, Greece. Fax: +30-61-996-195. e-mail: aralli
atlas.uoa.gr