LINGUIST List 17.2368
|
Tue Aug 22 2006
Software: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06
Editor for this issue: Svetlana Aksenova
<svetlana linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Helene
Mazo,
NEMLAR Arabic Resouces in ELRA Catalogue - 08/06
Message 1: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06
|
Date: 22-Aug-2006
From: Helene Mazo <mazo elda.org>
Subject: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06
ELRA - Language Resources Catalogue - Update We are happy to announce the following Arabic resources, produced within the NEMLAR project (www.nemlar.org). All 3 resources are owned and copyrighted by the Nemlar Consortium. They are available in our catalogue. To view all the Language Resources available, you can visit our on-line catalogue: http://www.elra.info or http://www.elda.org ELRA-W0042 NEMLAR Written Corpus This corpus consists of about 500,000 words of Arabic text from 13 different categories. The text is provided in 4 different versions: - Raw text - Fully vowelized text - Text with Arabic lexical analysis - Text with Arabic POS-tags The database is distributed on 1 ISO 9660 CD-ROM volume. For more information, see http://catalog.elda.org:8080/product_info.php?products_id=873&osCsid=2eb47737dba8e4365c4972784a235948 ELRA-S0219 NEMLAR Broadcast News Speech Corpus The data consists of about 40 hours and is provided by ELDA of Arabic data (mainly Standard Arabic from a number of broadcast companies); Transcriptions follow the Transcriber conventions as used by ELDA and focus on the orthographic, named entities, speaker/turn segmentation levels. No phonetic transcription/segmentation is planned. The database is distributed in 1 ISO 9660 DVD-ROM volume. For more information, see http://catalog.elda.org:8080/product_info.php?products_id=874&osCsid=2eb47737dba8e4365c4972784a235948 ELRA-S0220 NEMLAR Speech Synthesis Corpus The NEMLAR Speech Synthesis Corpus contains the recordings of 2 native Egyptian speakers (male and female, 35 years old) recorded in a studio over 2 channel (voice + laryngograph). The data collection and transcription were performed by RDI (Egypt). Speech samples are stored in 96 kHz, 24 bit with the least significant byte first (“lohi” or Intel format) as (signed) integers. The speaker read 2,032 prompted sentences covering approx. 42,000 words in three categories: transcribed speech (20%), written text (50%), and constructed phrases (30%). The database is provided with orthographic, prosodic and phonetic transcriptions in SAMPA. All transcriptions were segmented at the utterance (sentence/command word) level, annotated at the word level and checked manually. A pronunciation lexicon including 3,589 headwords with phonetics in SAMPA is also available. The database is distributed on 3 ISO 9660 DVD-ROM volumes. For more information, see http://catalog.elda.org:8080/product_info.php?products_id=875&osCsid=2eb47737dba8e4365c4972784a235948 For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli elda.org
Linguistic Field(s):
Computational Linguistics
Lexicography
Phonetics
Text/Corpus Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|