LINGUIST List 13.2484

Mon Sep 30 2002

FYI: American National Corpus, New Corpora from LDC

Editor for this issue: James Yuells <jameslinguistlist.org>


Directory

  • Nancy Ide, ACL papers in the American National Corpus
  • LDC Office, New Corpora from the LDC

    Message 1: ACL papers in the American National Corpus

    Date: Fri, 27 Sep 2002 12:57:41 -0400
    From: Nancy Ide <idecs.vassar.edu>
    Subject: ACL papers in the American National Corpus


    The American National Corpus Consortium, with permission from the Association for Computational Linguistics, will include in the American National Corpus a selection of recent papers written by American authors and published in ACL proceedings and anthologies. Any authors who object to having their papers included in the American National Corpus should contact Nancy Ide (idecs.vassar.edu) to have their papers removed.

    Note that this applies to papers whose authors are native speakers of American English only.

    =======================================================

    Nancy Ide

    Professor and Chair Department of Computer Science, Vassar College Poughkeepsie, NY 12604-0520 USA Tel: +1 845 437-5988 Fax: +1 845 437-7498 idecs.vassar.edu

    Chercheur Associe Equipe Langue et Dialogue, LORIA/CNRS Campus Scientifique - BP 239 54506 Vandoeuvre-les-Nancy FRANCE Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79 ideloria.fr

    =======================================================

    Message 2: New Corpora from the LDC

    Date: Mon, 30 Sep 2002 13:28:31 -0400
    From: LDC Office <ldcldc.upenn.edu>
    Subject: New Corpora from the LDC


    * ACQUAINT English News Text *

    * 2001 NIST Speaker Recognition Evaluation *

    The Linguistic Data Consortium (LDC) is pleased to announce the availability of two new corpora.

    *

    The ACQUAINT English News Text corpus consists of English newswire text, drawn from three sources: the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service. It was prepared by the LDC for the AQUAINT Project, and will be used in official benchmark evaluations conducted by National Institute of Standards and Technology (NIST).

    This two disc publication contains roughly 375 million words correlating to about 3 GB of data. The text data are separated into directories by source (apw, nyt, xie); within each source, data files are subdivided by year, and within each year, there is one file per date of collection. For further information, please visit:

    http://www.ldc.upenn.edu/Catalog/LDC2002T31.html

    Institutions that have membership in the LDC during the 2002 Membership Year will be able to receive this corpus free of charge. Nonmembers may purchase this publication for $1000.

    *

    The 2001 NIST Speaker Recognition Evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition.

    The single CD-ROM 2001 NIST Speaker Recognition Evaluation corpus is based entirely on conversational cellular telephone speech collected by the LDC. The files are divided into evaluation and development data. There are a total of 2,350 compressed speech files, all of which are in SPHERE format.

    For further information, including a link to the 2001 NIST Speaker Recognition Evaluation website, please visit:

    http://www.ldc.upenn.edu/Catalog/LDC2002S34.html

    Institutions that have membership in the LDC during the 2002 Membership Year will be able to receive this corpus free of charge. Nonmembers may purchase this publication for $400.

    *

    If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email to <ldcldc.upenn.edu> or call (215) 573-1275.

    - ------------------------------------------------------------------ Linguistic Data Consortium Phone: (215) 573-1275 3600 Market Street Fax: (215) 573-2175 Suite 810 email: ldcldc.upenn.edu Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu