Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
EUROPEAN LANGUAGE RESOURCES ASSOCIATION ELRA News ===================================== *** ELRA NEW RESOURCES - Part 1 *** The ELRA catalogue has been updated with the following resources. ******************************************** * ELRA-S0050 Russian speech database (STC) * ******************************************** The STC Russian speech database was recorded in 1996-1998. The main purpose of the database is to investigate individual speaker variability and to validate speaker recognition algorithms. The database was recorded through a 16-bit Vibra-16 Creative Labs sound card with an 11,025 Hz sampling rate. The database contains Russian read speech of 89 different speakers (54 male, 35 female), including 70 speakers with 15 sessions or more, 10 speakers with 10 sessions or more and 9 speakers with less than 10 sessions. The speakers were recorded in Saint-Petersburg and are within the age of 18-62. All are native speakers. The corpus consists of 5 sentences. Each speaker reads carefully but fluently each sentence 15 times on different dates over the period of 1-3 months. The corpus contains a total of 6,889 utterances and of 2 volumes, total size 700 MB uncompressed data. The signal of each utterance is stored as a separate file (approx. 126 KB). Total size of data for one speaker approximates 9,500 KB. Average utterance duration is about 5 sec. A file gives information about the speakers (speaker's age and gender). The orthography and phonetic transcription of the corpus is given in separate files which contain the prompted sentences and their transcription in IPA. The signal files are raw files without any header, 16 bit per sample, linear, 11,025 Hz sample frequency. The recording conditions were as follows: - Microphone: dynamic omnidirectional high-quality microphone, distance to mouth 5-10 cm - Environment: office room - Sampling rate: 11,025 Hz - Resolution: 16 Bit - Sound board: Creative Labs Vibra-16 Means of delivery: CD-ROM Price for ELRA members: for research use: 400 ECU for commercial use: 2000 ECU Price for non members: for research use: 800 ECU for commercial use: 4000 ECU ********************************************* * ELRA-S0051 German SpeechDat(II) FDB 1000 * ********************************************* The German SpeechDat(II) FDB 1000 consists of 988 calls over the German fixed network, stored on 4 CD-ROMs in the final SpeechDat(II) database exchange format. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications. The following items were recorded: 1 isolated digit (read or prompted) 1 sequence of 10 isolated digit 4 connected digits 4-6 digit number to identify the prompt sheet ca. 10 digit telephone number (read) 14-16 digit credit card number (read, 150 different credit card numbers were found) 6 digit PIN code (read) 1 natural number (read) 1 money amount (read) 3 spelled words (1 spontaneous name spelling, 2 read) 1 time of day (spontaneous) 1 time phrase (read) 1 date (spontaneous) 1 date (read) 1 relative date (read) 2 yes/no questions (spontaneous, not prompted) 3/6 common application words (read) All application words are recorded more than 80 times. These are: 1 application word phrase 9 phonetically rich sentences (read) 4 phonetically rich words (read) 5 directory assistance names (1 spontaneous name (e.g. forename), 1 spontaneous city name, 1 read city name (from a list of 500 most frequent), 1 read company/agency name (from a list of 500 most frequent), 1 read proper name, fore- and surname (from list of 150 SDB names). Price for research use (in ECU) Members Non members German SpeechDat(II) FDB-1000 15,000 25,000 German SpeechDat(II) FDB-1000 + German SpeechDat(M) DB1 or DB2 20,000 30,000 Price for commercial use (in ECU) Members Non members German SpeechDat(II) FDB-1000 18,000 25,000 German SpeechDat(II) FDB-1000 + German SpeechDat(M) DB1 or DB2 25,000 35,000 SPECIAL OFFERS: 1) Price of German SpeechDat(II) FDB-1000 for ELRA members who already purchased German SpeechDat(M) DB1 (ELRA-S0018) : Before 30.06.1998: 10,000 ECU Between 30.06.1998 and 31.12.1998: 11,000 ECU 2) If the purchase of SpeechDat(II) FDB-1000 occurs in the same calendar year of DB1 or DB2, the package price will be: for research use: 20,000 ECU for ELRA members and 30,000 ECU for non members; for commercial use: 25,000 ECU for ELRA members and 35,000 ECU for non members. ******************************************** For more information, please contact: ELRA/ELDA 55-57 rue Brillat Savarin 75013 PARIS Tel: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 E-mail: info-elraMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecalva.net http://www.icp.grenet.fr/ELRA/home.html ********************************************
EUROPEAN LANGUAGE RESOURCES ASSOCIATION ELRA News ===================================== *** ELRA NEW RESOURCES - Part 2 *** The ELRA catalogue has been updated with the following resources. ************************************************* * ELRA-L0030 Bulgarian Morphological Dictionary * ************************************************* This dictionary contains 67500 entries divided into 242 inflectional types (including proper nouns), morphosyntactic information for each entry, and a morphological engine (MS DOS and WINDOWS 95/NT) for morphological analysis and generation. The data may be used for morphological analysis and synthesis. Structure of entries: Local linguistic variant File format: ASCII; lowercase letters Standard in use: SO Character set: 8-bit ASCII ASCII codes alphabetically: 160-191 Medium: Floppy disk Price for Research use: ELRA members: 45 ECU Non members: 100 ECU Price for Commercial use: ELRA members: 6,000 ECU Non members: 12,000 ECU **************************************************************** * ELRA-M0014 Bilingual dictionaries (Translation Experts Ltd.) * **************************************************************** Bilingual dictionaries for demonstration and commercial use containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. The level of information in each entry varies depending on the word/phrase and on the dictionary. However, all of the above are present in varying degrees in the dictionaries. These dictionaries may be of interest in particular for spell-checking, thesaurus, hyphenation and translation of natural languages. A Level 2 translation engine, also available via ELRA, provides exact translations, output in LOCAL-UCS format, for input words and phrases, input in LOCAL-UCS format, based on the vocabulary stored in a compressed translation file. Each pair of languages may be purchased as different sets or subsets, corresponding to the indicated number of entries. All pairs consist of English to and from another language. The following groups of languages are available: GROUP 1 (English <=> Language A): Language A = Spanish (25,000, 60,000, 100,000 and 200,000 entries), French (40,000, 80,000, 100,000 and 200,000 entries), German (40,000, 80,000 and 126,000 entries), Italian (20,000 and 40,000 entries), Brazilian Portuguese (40,000, 80,000 and 400,000 entries), Portuguese (40,000, 80,000, 110,000 and 234,000 entries), Dutch (40,000, 80,000 and 110,000 entries). GROUP 2 (English <=> Language B): Language B = Danish (40,000, 80,000 and 110,000 entries), Swedish (40,000, 80,000 and 110,000 entries), Finnish (30,000 entries), Icelandic (40,000, 80,000 and 100,000 entries). GROUP 3 (English <=> Language C): Language C = Russian (4,0000, 72,000 and 120,000 entries), Russian Business (60,000 entries), Russian Aerospace (60,000 entries), Russian Automotive (40,000 entries), Russian Minerals & Mining (60,000 entries), Polish (30,000, 80,000, 124,000 and 150,000 entries), Hungarian (30,000, 80,000 and 124,000 entries), Czech (40,000 entries), Romanian Starter (10,000 entries). GROUP 4 (English <=> Language D): Language D = Croatian (30,000 entries), Bosnian (30,000 entries), Serbian (Latin or Cyrillic) (30,000 entries). GROUP 5 (English <=> Language E): Language E = Japanese (40,000 entries). GROUP 6 (English <=> Language F): Language F = Greek (60,000 entries). File format: Text Standard in use: ISO Character set: 8-bit ASCII and UNICODE Means of delivery: CD-ROM, floppy disk or downloaded from the Web. Related tools: Word Translator(TM), NeuroTran, InterTran(TM), MobileTran(TM). Please see http://www.tranexp.com for more information The price per entry is as follows Price for ELRA members: For research use For commercial use GROUP 1 0.06 ECU/entry 0.25 ECU/entry GROUP 2 0.03 ECU/entry 0.18 ECU/entry GROUP 3 0.04 ECU/entry 0.20 ECU/entry GROUP 4 0.04 ECU/entry 0.20 ECU/entry GROUP 5 0.50 ECU/entry 1.00 ECU/entry GROUP 6 0.12 ECU/entry 0.50 ECU/entry Price for non members For research use For commercial use GROUP 1 0.12 ECU/entry 0.50 ECU/entry GROUP 2 0.06 ECU/entry 0.36 ECU/entry GROUP 3 0.08 ECU/entry 0.40 ECU/entry GROUP 4 0.08 ECU/entry 0.40 ECU/entry GROUP 5 1.00 ECU/entry 2.00 ECU/entry GROUP 6 0.24 ECU/entry 1.00 ECU/entry ******************************************** For more information, please contact: ELRA/ELDA 55-57 rue Brillat Savarin 75013 PARIS Tel: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 E-mail: info-elraMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecalva.net http://www.icp.grenet.fr/ELRA/home.html ********************************************