LINGUIST List 8.1552

Wed Oct 29 1997

FYI: North American News Text Corpus from the LDC

Editor for this issue: Brett Churchill <brettlinguistlist.org>


Directory

  1. LDC Office, A New Corpus from the Linguistic Data Consortium

Message 1: A New Corpus from the Linguistic Data Consortium

Date: Wed, 29 Oct 1997 13:36:56 EST
From: LDC Office <ldcunagi.cis.upenn.edu>
Subject: A New Corpus from the Linguistic Data Consortium



 Announcing a NEW CORPUS from the
 LINGUISTIC DATA CONSORTIUM

		 North American News Text Corpus


The Linguistic Data Consortium (LDC) announces the availability
of a corpus of North American news text. This corpus is a
collection of journalistic text in English from newswire and
newspaper sources in the United States.

The North American News Text corpus is composed of news text
that has been marked using SGML. The text is taken from the
following sources:

Source			 Dates 	 Aprox. #Words 
			 Covered	 (Millions)
- -------------------------------------------------------
Los Angeles Times &	 05/94-08/97	 52
 Washington Post

New York Times News	 07/94-12/96	 173
 Syndicate

Reuters News Service	 04/94-12/96	 85
 (General & Financial)

Wall Street Journal	 07/94-12/96	 40
- -------------------------------------------------------

Both the New York Times and the L.A.Times/Washington Post services
actually include a range of other newspaper sources in their
syndicated newswires. The L.A.Times/Wash.Post material will be found
to include the following sources (in lesser amounts) in addition to
the two predominant sources:

 Newsday
 The Baltimore Sun
 The Hartford Courant

The New York Times material will be found to contain the
following sources (in lesser amounts), but N.Y. Times articles
predominate:

 Bloomberg Business News
 The Boston Globe
 Los Angeles Daily News
 Fort Worth Star-Telegram
 Newsweek
 Cox News Service
 The Arizona Republic
 Seattle Post-Intelligencer
 San Francisco Examiner
 Houston Chronicle
 San Francisco Chronicle
 Economist Newspaper Ltd.
 Hearst Newspapers

Both of these newswire services also include small numbers of
articles from a larger set of miscellaneous sources. The ones
listed above appear with some frequency on a daily basis.

Because of restrictions imposed by the copyright holders of the
news text, this corpus is available to 1995, 1996 and 1997 LDC
members only. Members who wish to receive this corpus must
sign the North American News Text user agreement. This
agreement is available on the Linguistic Data Consortium WWW
Home Page at URL

http://www.ldc.upenn.edu/ldc/catalog/index.html.

If you would like to order a copy of this corpus, please email
your request to ldcunagi.cis.upenn.edu. If you need additional
information before placing your order, or would like to inquire
about membership in the LDC, please send email or call (215)
898-0464.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue