Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more

Donate Now | Visit the Fund Drive Homepage

Amount Raised:


Still Needed:


Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

Software Details

Title: New LDC Publications
Submitter: LDC Office
Description: Buckwalter Arabic Morphological Analyzer Version 1.0

Voicemail Corpus Part II

1997 HUB5 German Evaluation

The Linguistic Data Consortium (LDC) is pleased to announce the availability of three new publications.

1. The Buckwalter Arabic Morphological Analyzer Version 1.0 was created by Tim Buckwalter at Qamus for POS-tagging Arabic text. The analyzer consists primarily of three Arabic-English lexicon files: prefixes, suffixes, and stems. The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem
combinations, stem-suffix combinations, and prefix-suffix combinations.

The LDC is releasing this software under the GNU General Public License:


For information on commercial use, please visit:


Buckwalter Arabic Morphological Analyzer can be downloaded for free from the above link. If you would like a copy placed on CD-ROM, please note that there is a $100 media charge.

2. The Voicemail Corpus Part II is the second voicemail corpus created by Mukund Padmanabhan, Brian Kingsbury et al. at International Business Machines. This single disc publication is comprised of speech and transcript files, and is separated into training and evaluation data. The training data consists of 2048 voicemail messages and the
corresponding transcript files; the evaluation data consists of 50 voicemail messages and 50 transcripts.

For further information, please visit:


Institutions that have membership in the LDC during the 2002 Membership Year will be able to receive this corpus free of charge. As a 'Members Only' publication, the corpus is not available to nonmembers.

3. The 1997 Hub5 Non-English evaluation is part of an ongoing series of periodic evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of
conversational speech recognition.

The Hub5 Non-English evaluation focuses on the task of transcribing conversational telephone speech into text. The 1997 HUB5 German Evaluation is a single disc publications and contains nine hours of speech data. Transcripts are not included.

For more information, please visit:


Institutions that have membership in the LDC during the 2002 Membership Year will be able to receive this corpus free of charge. Nonmembers may purchase this publication for $1000.

If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email to <ldc@ldc.upenn.edu> or call (215) 573-1275.

- ------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 573-1275
3600 Market Street Fax: (215) 573-2175
Suite 810 email: ldc@ldc.upenn.edu
Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu
Linguistic Field(s): Computational Linguistics

Language Specialty: Arabic, Standard

LL Issue: 13.2968
Date Posted: 15-Nov-2002

Search Again

Back to Software Index