LINGUIST List 8.770

Fri May 23 1997

FYI: Exercises, Lg Resources, Workshop

Editor for this issue: T. Daniel Seely <seelylinguistlist.org>


Directory

  • Marmo Soemarmo, Exercises on the Web
  • Khalid Choukri, ELRA New Language Resources
  • Helmer Strik, modeling pronunciation variation for ASR

    Message 1: Exercises on the Web

    Date: Tue, 20 May 1997 10:53:38 -0700
    From: Marmo Soemarmo <soemarmooak.cats.ohiou.edu>
    Subject: Exercises on the Web


    I put a sample of my exercises on the web. Check them out at:

    http://www.cats.ohiou.edu/~linguist/lexicon/lexicon.htm

    In case you haven't checked out my Language Games, it's at:

    http://ouvaxa.cats.ohiou.edu/~soemarmo/games/menu.htm

    Marmo

    Message 2: ELRA New Language Resources

    Date: Tue, 20 May 1997 20:06:14 +0200 (MET DST)
    From: Khalid Choukri <elracalvanet.calvacom.fr>
    Subject: ELRA New Language Resources


    [ We apologise for the duplicate posting of this announcement ]

    EUROPEAN LANGUAGE RESOURCES ASSOCIATION (ELRA)

    *** NEW CATALOGUE & NEW RESOURCES ***





    The new release of ELRA catalogue (vol2N1) has grown up and currently consists of:

    1) Spoken resources: 37 databases in several languages (recordings from microphone, telephone, continuous speech, isolated words, phonetic distionaries, etc.).

    2) Written resources: * 14 monolingual and multilingual corpora * 28 monolingual lexica * Around 60 multilingual lexica * A linguistic software platform and grammars development platform

    3) Terminological resources: over 360 databases with a wide range of domains and several languages (Catalan, Danish, English, French, German, Italian, Latin, Polish, Portuguese, Spanish, Turkish).

    Since our last news on this electronic list, new resources have been negotiated by ELRA and are now available. These are:

    SPEECH AND RELATED RESOURCES

    ELRA-S0035 Phonolex (BAS/DFKI):

    PHONOLEX consists of a simple list of word forms (666,237 inflected words) with a set of features e.g. orthography (German 'Umlauts' in LaTeX format, capital nouns, old German spelling rules), linguistic information (nouns, verbs, etc.), pronunciation and a list of empirical pronunciations.

    Language: German Format: ASCII Mark-up: extended SAM-PA (PhonDat-Verbmobil)

    - --------------------------------------------------------------------------

    ELRA-S0036 Speri-Data AG Basic dictionaries (colloquial language):

    These dictionaries contain a daily-life vocabulary. They include phonetic transcriptions with related phoneme lists. The following languages are available:

    Language Entries Danish 8,000 Dutch 12,000 English (UK) 8,000 Finnish 10,000 French 19,000 German 13,000 Italian 23,000 Norwegian 8,000 Portuguese 9,000 Spanish 13,000 Swedish 10,000

    - --------------------------------------------------------------------------

    ELRA-S0037 Speri-Data AG Technical dictionaries:

    All dictionaries contain phonetic transcriptions, with related phoneme lists. The following dictionaries are available (the label basic dictionary refers to the above ELRA-S0036):

    Domain Entries Banking French 10,200 Banking German 10,200 Banking Italian 10,200 Banking Spanish 10,200 Radiology German 42,000 (including basic dictionary) Radiology English 16,000 Medical German 130,000 (including basic dictionary) Jurisprudence German 31,000 Jurisprudence German 55,000 (including basic dictionary) Insurance German & English 37,000

    A peculiarity of medical dictionaries in German speaking countries has to be taken into consideration: doctors in Germany, Austria and Switzerland may not use the original technical terms in Latin but the Latin word in a spelled manner or a German technical term (see examples below). Medical dictionaries therefore have to contain three different terms.

    Technical term Technical term Technical term in Latin in German spelling in German

    Appendicitis Appendizitis BlinddarmentzFCndung Eccema Eczema Ekzem Diarrhoe DiarrhF6 or DiarrhF6e Durchfall, Durchfluss Carbunculus Karbunkel GeschwFCr

    - --------------------------------------------------------------------------

    ELRA-S0038 Siemens VoiceMail (American English)

    VoiceMail consists of 17,5 hours of read acoustic speech divided into 9,5 hours of transliterated speech and 8 hours of non-transliterated speech recorded over the digital telephone network (ISDN) with 921 speakers originated from the USA. It contains orthographic transliteration for about 25,000 utterrances (of 34,912 utterances in total).

    Language: American English Standard in use: headerless, one separate transliteration file comprising all utterances of all speakers Sampling rate: 8 kHz Speakers: 377 males and 544 females Size: 17,5 hours Medium: 2 CD-ROM

    WRITTEN RESOURCES - MONOLINGUAL LEXICA

    ELRA-L0021 Dictionary of French verbs - CORA:

    This dictionary contains 25,610 verbs with usage domains, level of language (familiar, popular, literary, Quebec and Swiss terms, etc.), conjugation, auxiliary, verbal adjectives in -able, -ant or -E9, encoded syntactical constructions (subject, direct & indirect object, adverb), sample phrases, synonyms, operators enabling semantic-syntactic classification, encoding of derived forms in -age, -ment, -tion, -oir, -ure, deverbal nouns, base words from which verbs can be derived, a scale of usage ranging from 1 to 6, like those used by commercial dictionaries (basic vocabulary, extended, specialised, etc.). Codes enable automatic production of conjugation forms, derived nouns and adjectives and, if necessary, the production of potential forms.

    - --------------------------------------------------------------------------

    ELRA-L0022 Dictionary of words - CORA:

    This dictionary is composed of 126,844 words, with usage domains, grammatical category, gender, number, uncountable, collective, adjectival, nominal, verbal, adverbial derived forms according to the type of words.

    - --------------------------------------------------------------------------

    ELRA-L0023 Dictionary of affixes - CORA:

    4,286 suffixes and prefixes, plus information on their verbal, nominal or adjectival bases or on the verbal basis of greco-latin items. This dictionary does not include the suffixes contained in the dictionary of French verbs (ELRA-L0021) and words (ELRA-L0022) such as -age, -ment, -if, -oir.

    - --------------------------------------------------------------------------

    ELRA-L0024 Dictionary of verb phrases - CORA:

    Dictionary of 3,480 entries based on the model of the dictionary of French verbs (ELRA-L0021).

    - --------------------------------------------------------------------------

    ELRA-L0025 Dictionary of invariable forms and phrases - CORA:

    Dictionary of 4,783 entries based on the model of the dictionary of words (ELRA-L0022).

    - --------------------------------------------------------------------------

    ELRA-L0026 Dictionary of exclamatory stereotyped phrases - CORA:

    Dictionary of 1,901 entries based on the model of the dictionary of invariable forms and phrases (ELRA-L0025).

    - --------------------------------------------------------------------------

    ELRA-L0027 Dictionary of French local authorities - CORA:

    38,965 entries in lower cases with accents, controlled on the guide Michelin, without localities; A link can be made to the dictionary of words (ELRA-L0022) which contains inhabitants' names and their correspondence with town names.

    - --------------------------------------------------------------------------

    ELRA-L0028 Dictionary of noun phrases and plural-only words - CORA:

    2,138 compound names and 1,397 entries of plural-only words.

    For further information, please contact :

    ELRA/ELDA 87, Avenue d'Italie FR-75013 PARIS FRANCE Tel : +33 01 45 86 53 00 Fax : +33 01 45 86 44 88 E-mail : info-elracalva.net WWW: http://www.icp.grenet.fr/ELRA/home.html

    .................................... Khalid CHOUKRI ELRA /ELDA Tel. +33 1 45 86 53 00 Fax. +33 1 45 86 44 88 87, Avenue D'ITALIE, 75013 PARIS Email: elracalvanet.calvacom.fr Web: http://www.icp.grenet.fr/ELRA/home.html ....................................

    Message 3: modeling pronunciation variation for ASR

    Date: Wed, 21 May 1997 10:03:58 +0200 (MDT)
    From: Helmer Strik <striklet.kun.NL>
    Subject: modeling pronunciation variation for ASR


    Below is some information on the workshop 'modeling pronunciation variation for automatic speech recognition' that will be organized from 4-6 May 1998 in The Netherlands. More information about the workshop is available at http://lands.let.kun.nl/pron-var/.

    Ajo, Helmer



    advance notice ESCA Tutorial and Research Workshop on MODELING PRONUNCIATION VARIATION FOR AUTOMATIC SPEECH RECOGNITION 4-6 May 1998

    at Rolduc, a former monestary in the city of Kerkrade in the south of The Netherlands Organized by ESCA European Speech Communication Association COST Telecom Action 249 Continuous Speech Recognition over the Telephone A2RT 'Automatic Acoustic Recognition Technologies' Dept. of Language & Speech Nijmegen University



    TOPIC OF THE WORKSHOP

    Automatic Speech Recognizers (ASR's) have improved substantially during the last decade. It has now become possible to use ASR's for many practical applications. However, when ASR's are used (and tested) under realistic conditions, the problem of pronunciation variation almost always emerges. This problem has been recognized by several research groups, and more and more effort is spent nowadays on solving this problem (see e.g. the steadily growing number of publications on this topic, especially in conference proceedings).

    During this workshop we want to discuss this problem in depth and the different ways in which it could be solved. Although part of pronunciation variation is certainly language-dependent (i.e. the phonological and phonetic processes differ between languages), a large part of the variation is language independent. Furthermore, the techniques that can be used to solve this problem, i.e. to model pronunciation variation for ASR, are usually language-independent.

    WWW-SITE

    Up-to-date information about the workshop is available at http://lands.let.kun.nl/pron-var/.

    SCIENTIFIC COMMITTEE

    Elizabeth Shriberg Herve Bourlard Li Deng Lori Lamel Mari Ostendorf Patti Price Roger Moore Rolf Carlson Sadaoki Furui Steve Young

    CONTACT PERSON

    Helmer Strik Dept. of Language and Speech P.O. Box 9103 6500 HD Nijmegen The Netherlands

    Tel.nr.: 31-24-3616104 Fax nr.: 31-24-3615939 E-mail : Striklet.kun.nl URL http://lands.let.kun.nl/TSpublic/strik