LINGUIST List 13.2484

Mon Sep 30 2002

FYI: American National Corpus, New Corpora from LDC

Editor for this issue: James Yuells <jameslinguistlist.org>


Directory

  1. Nancy Ide, ACL papers in the American National Corpus
  2. LDC Office, New Corpora from the LDC

Message 1: ACL papers in the American National Corpus

Date: Fri, 27 Sep 2002 12:57:41 -0400
From: Nancy Ide <idecs.vassar.edu>
Subject: ACL papers in the American National Corpus

The American National Corpus Consortium, with permission from the
Association for Computational Linguistics, will include in the American
National Corpus a selection of recent papers written by American authors
and published in ACL proceedings and anthologies. Any authors who object
to having their papers included in the American National Corpus should
contact Nancy Ide (idecs.vassar.edu) to have their papers removed.

Note that this applies to papers whose authors are native speakers of 
American English only.

=======================================================

Nancy Ide

Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 845 437-5988 Fax: +1 845 437-7498
idecs.vassar.edu

Chercheur Associe
Equipe Langue et Dialogue, LORIA/CNRS
Campus Scientifique - BP 239
54506 Vandoeuvre-les-Nancy FRANCE
Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
ideloria.fr

=======================================================
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: New Corpora from the LDC

Date: Mon, 30 Sep 2002 13:28:31 -0400
From: LDC Office <ldcldc.upenn.edu>
Subject: New Corpora from the LDC


	 * ACQUAINT English News Text *

 * 2001 NIST Speaker Recognition Evaluation *


The Linguistic Data Consortium (LDC) is pleased to announce the
availability of two new corpora. 

			 *

The ACQUAINT English News Text corpus consists of English newswire text,
drawn from three sources: the Xinhua News Service (People's Republic of
China), the New York Times News Service, and the Associated Press
Worldstream News Service. It was prepared by the LDC for the AQUAINT
Project, and will be used in official benchmark evaluations conducted by
National Institute of Standards and Technology (NIST). 

This two disc publication contains roughly 375 million words correlating
to about 3 GB of data. The text data are separated into directories by
source (apw, nyt, xie); within each source, data files are subdivided by
year, and within each year, there is one file per date of collection.
 
For further information, please visit:

http://www.ldc.upenn.edu/Catalog/LDC2002T31.html

Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge. 
Nonmembers may purchase this publication for $1000.

			 *

The 2001 NIST Speaker Recognition Evaluation is part of an ongoing
series of yearly evaluations conducted by NIST. These evaluations
provide an important contribution to the direction of research efforts
and the calibration of technical capabilities. They are intended to be
of interest to all researchers working on the general problem of text
independent speaker recognition.

The single CD-ROM 2001 NIST Speaker Recognition Evaluation corpus is
based entirely on conversational cellular telephone speech collected by
the LDC. The files are divided into evaluation and development data. 
There are a total of 2,350 compressed speech files, all of which are 
in SPHERE format. 

For further information, including a link to the 2001 NIST Speaker
Recognition Evaluation website, please visit:

http://www.ldc.upenn.edu/Catalog/LDC2002S34.html

Institutions that have membership in the LDC during the 2002
Membership Year will be able to receive this corpus free of charge. 
Nonmembers may purchase this publication for $400.

			 *

If you need additional information before placing your order, or 
would like to inquire about membership in the LDC, please send email to
<ldcldc.upenn.edu> or call (215) 573-1275. 

	
- ------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 573-1275
3600 Market Street Fax: (215) 573-2175
Suite 810 email: ldcldc.upenn.edu
Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue