LINGUIST List 9.891

Tue Jun 16 1998

FYI: ELRA News

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. Valerie Mapelli, ELRA News - New resources 1/2
  2. Valerie Mapelli, ELRA News - New resources 2/2

Message 1: ELRA News - New resources 1/2

Date: Thu, 11 Jun 1998 17:16:52 +0200 (MET DST)
From: Valerie Mapelli <info-elracalva.net>
Subject: ELRA News - New resources 1/2

 
 EUROPEAN LANGUAGE RESOURCES ASSOCIATION
 ELRA News 
 =====================================


 *** ELRA NEW RESOURCES - Part 1 ***


The ELRA catalogue has been updated with the following resources.


********************************************
* ELRA-S0050 Russian speech database (STC) *
********************************************

The STC Russian speech database was recorded in 1996-1998. The main
purpose of the database is to investigate individual speaker
variability and to validate speaker recognition algorithms. The
database was recorded through a 16-bit Vibra-16 Creative Labs sound
card with an 11,025 Hz sampling rate.

The database contains Russian read speech of 89 different speakers (54
male, 35 female), including 70 speakers with 15 sessions or more, 10
speakers with 10 sessions or more and 9 speakers with less than 10
sessions. The speakers were recorded in Saint-Petersburg and are
within the age of 18-62. All are native speakers.

The corpus consists of 5 sentences. Each speaker reads carefully but
fluently each sentence 15 times on different dates over the period of
1-3 months. The corpus contains a total of 6,889 utterances and of 2
volumes, total size 700 MB uncompressed data. The signal of each
utterance is stored as a separate file (approx. 126 KB). Total size of
data for one speaker approximates 9,500 KB. Average utterance duration
is about 5 sec.

A file gives information about the speakers (speaker's age and
gender). The orthography and phonetic transcription of the corpus is
given in separate files which contain the prompted sentences and their
transcription in IPA. The signal files are raw files without any
header, 16 bit per sample, linear, 11,025 Hz sample frequency.

The recording conditions were as follows:

- Microphone: dynamic omnidirectional high-quality microphone,
distance to mouth 5-10 cm
- Environment: office room
- Sampling rate: 11,025 Hz
- Resolution: 16 Bit
- Sound board: Creative Labs Vibra-16

Means of delivery: CD-ROM

Price for ELRA members:
 for research use: 400 ECU
	for commercial use: 2000 ECU

Price for non members:
	for research use: 800 ECU
	for commercial use: 4000 ECU


*********************************************
* ELRA-S0051 German SpeechDat(II) FDB 1000 *
*********************************************

The German SpeechDat(II) FDB 1000 consists of 988 calls over the
German fixed network, stored on 4 CD-ROMs in the final SpeechDat(II)
database exchange format. The speech databases made within the
SpeechDat(II) project were validated by SPEX, the Netherlands, to
assess their compliance with the SpeechDat format and content
specifications.

The following items were recorded:
 1 isolated digit (read or prompted)
 1 sequence of 10 isolated digit
 4 connected digits 
 4-6 digit number to identify the prompt sheet 
 ca. 10 digit telephone number (read) 
 14-16 digit credit card number (read, 150 different credit card
numbers were found)
 6 digit PIN code (read)
 1 natural number (read)
 1 money amount (read)
 3 spelled words (1 spontaneous name spelling, 2 read)
 1 time of day (spontaneous)
 1 time phrase (read)
 1 date (spontaneous)
 1 date (read)
 1 relative date (read)
 2 yes/no questions (spontaneous, not prompted)
 3/6 common application words (read)

All application words are recorded more than 80 times. These are:
 1 application word phrase
 9 phonetically rich sentences (read)
 4 phonetically rich words (read)

 5 directory assistance names (1 spontaneous name (e.g. forename), 1
spontaneous city name, 1 read city name (from a list of 500 most
frequent), 1 read company/agency name (from a list of 500 most
frequent), 1 read proper name, fore- and surname (from list of 150 SDB
names).

 Price for research use (in ECU) Members Non members
 German SpeechDat(II) FDB-1000 15,000 25,000
 German SpeechDat(II) FDB-1000 
 + German SpeechDat(M) DB1 or DB2 20,000 30,000

 Price for commercial use (in ECU) Members Non members
 German SpeechDat(II) FDB-1000 18,000 25,000
 German SpeechDat(II) FDB-1000 
 + German SpeechDat(M) DB1 or DB2 25,000 35,000

SPECIAL OFFERS:

1) Price of German SpeechDat(II) FDB-1000 for ELRA members who 
already purchased German SpeechDat(M) DB1 (ELRA-S0018) :

 Before 30.06.1998:				10,000 ECU
 Between 30.06.1998 and 31.12.1998:		11,000 ECU

2) If the purchase of SpeechDat(II) FDB-1000 occurs in the same
calendar year of DB1 or DB2, the package price will be:
 for research use: 20,000 ECU for ELRA members and 30,000 ECU for non 
 members;
 for commercial use: 25,000 ECU for ELRA members and 35,000 ECU for 
 non members.


 ********************************************
 For more information, please contact:
 ELRA/ELDA
 55-57 rue Brillat Savarin
 75013 PARIS
 Tel: +33 1 43 13 33 33
 Fax: +33 1 43 13 33 30
 E-mail: info-elracalva.net
 http://www.icp.grenet.fr/ELRA/home.html
 ********************************************
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: ELRA News - New resources 2/2

Date: Thu, 11 Jun 1998 17:17:33 +0200 (MET DST)
From: Valerie Mapelli <info-elracalva.net>
Subject: ELRA News - New resources 2/2


 
 EUROPEAN LANGUAGE RESOURCES ASSOCIATION
 ELRA News 
 =====================================


 *** ELRA NEW RESOURCES - Part 2 ***


The ELRA catalogue has been updated with the following resources.


*************************************************
* ELRA-L0030 Bulgarian Morphological Dictionary *
*************************************************

This dictionary contains 67500 entries divided into 242 inflectional
types (including proper nouns), morphosyntactic information for each
entry, and a morphological engine (MS DOS and WINDOWS 95/NT) for
morphological analysis and generation. The data may be used for
morphological analysis and synthesis.

Structure of entries: Local linguistic variant
File format: ASCII; lowercase letters
Standard in use: SO
Character set:	8-bit ASCII ASCII codes alphabetically: 160-191
Medium: Floppy disk


Price for Research use:
	ELRA members: 45 ECU
	Non members: 100 ECU

Price for Commercial use:
	ELRA members: 6,000 ECU
	Non members: 12,000 ECU



****************************************************************
* ELRA-M0014 Bilingual dictionaries (Translation Experts Ltd.) *
****************************************************************

Bilingual dictionaries for demonstration and commercial use containing
local linguistic variant, local spelling variant, words frequency,
usage (familiar, old, slang, etc.) and semantic features. The level of
information in each entry varies depending on the word/phrase and on
the dictionary. However, all of the above are present in varying
degrees in the dictionaries. These dictionaries may be of interest in
particular for spell-checking, thesaurus, hyphenation and translation
of natural languages. A Level 2 translation engine, also available via
ELRA, provides exact translations, output in LOCAL-UCS format, for
input words and phrases, input in LOCAL-UCS format, based on the
vocabulary stored in a compressed translation file.

Each pair of languages may be purchased as different sets or subsets,
corresponding to the indicated number of entries. All pairs consist of
English to and from another language. The following groups of
languages are available:

GROUP 1 (English <=> Language A):
Language A = 
 Spanish (25,000, 60,000, 100,000 and 200,000 entries), 
 French (40,000, 80,000, 100,000 and 200,000 entries), 
 German (40,000, 80,000 and 126,000 entries), 
 Italian (20,000 and 40,000 entries), 
 Brazilian Portuguese (40,000, 80,000 and 400,000 entries), 
 Portuguese (40,000, 80,000, 110,000 and 234,000 entries), 
 Dutch (40,000, 80,000 and 110,000 entries).

GROUP 2 (English <=> Language B):
Language B = 
 Danish (40,000, 80,000 and 110,000 entries), 
 Swedish (40,000, 80,000 and 110,000 entries), 
 Finnish (30,000 entries), 
 Icelandic (40,000, 80,000 and 100,000 entries).

GROUP 3 (English <=> Language C):
Language C = 
 Russian (4,0000, 72,000 and 120,000 entries), 
 Russian Business (60,000 entries), 
 Russian Aerospace (60,000 entries), 
 Russian Automotive (40,000 entries), 
 Russian Minerals & Mining (60,000 entries), 
 Polish (30,000, 80,000, 124,000 and 150,000 entries), 
 Hungarian (30,000, 80,000 and 124,000 entries), 
 Czech (40,000 entries), 
 Romanian Starter (10,000 entries).

GROUP 4 (English <=> Language D):
Language D = 
 Croatian (30,000 entries), 
 Bosnian (30,000 entries), 
 Serbian (Latin or Cyrillic) (30,000 entries).

GROUP 5 (English <=> Language E):
Language E = 
 Japanese (40,000 entries).

GROUP 6 (English <=> Language F):
Language F = 
 Greek (60,000 entries).

File format: Text
Standard in use: ISO
Character set: 8-bit ASCII and UNICODE
Means of delivery: CD-ROM, floppy disk or downloaded from the Web.
Related tools: Word Translator(TM), NeuroTran, InterTran(TM),
MobileTran(TM).

Please see http://www.tranexp.com for more information

The price per entry is as follows

	 Price for ELRA members:
 For research use For commercial use
GROUP 1	 0.06 ECU/entry 0.25 ECU/entry
GROUP 2 0.03 ECU/entry 0.18 ECU/entry	
GROUP 3 0.04 ECU/entry 0.20 ECU/entry	
GROUP 4 0.04 ECU/entry 0.20 ECU/entry
GROUP 5 0.50 ECU/entry 1.00 ECU/entry
GROUP 6 0.12 ECU/entry 0.50 ECU/entry

	Price for non members
 For research use For commercial use
GROUP 1 0.12 ECU/entry 0.50 ECU/entry
GROUP 2 0.06 ECU/entry 0.36 ECU/entry
GROUP 3 0.08 ECU/entry 0.40 ECU/entry
GROUP 4 0.08 ECU/entry 0.40 ECU/entry
GROUP 5 1.00 ECU/entry 2.00 ECU/entry
GROUP 6 0.24 ECU/entry 1.00 ECU/entry


 ********************************************
 For more information, please contact:
 ELRA/ELDA
 55-57 rue Brillat Savarin
 75013 PARIS
 Tel: +33 1 43 13 33 33
 Fax: +33 1 43 13 33 30
 E-mail: info-elracalva.net
 http://www.icp.grenet.fr/ELRA/home.html
 ********************************************

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue