Editor for this issue: James Yuells <james
linguistlist.org>
The December 2002 issue of the LSA Bulletin is now available at the Linguistic Society of America website: http://www.lsadc.org.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
* English Gigaword * The Linguistic Data Consortium (LDC) is pleased to announce the availability of the English Gigaword corpus. English Gigaword is a comprehensive archive of newswire text data in English that has been acquired over several years by the LDC. The newswire texts are drawn from four international sources: Agence France Press English Service Associated Press Worldstream English Service The New York Times Newswire Service The Xinhua News Agency English Service English Gigaword is the first LDC publication to be distributed on DVD. Much of the content in this collection has been published previously by the LDC in a variety of other, older corpora, particularly, the North American News text corpora (LDC95T21, LDC98T30), the various TDT corpora and the AQUAINT text corpus (LDC2002T31). In addition to this previously published data, the English Gigaword corpus contains a significant amount of previously unreleased data, specifically, all of the Agence France Presse content, the 1995 and 2001 Xinhua content, and portions of NYT and APW dating from February 2001 forward. All text data are presented in SGML form, using a very simple, minimal markup structure; all text consists of printable ASCII and whitespace. The text formatting is consistent across all sources. The English Gigaword corpus has been fully validated by a standard SGML parser utility (nsgmls), using a DTD file which is provided as part of this publication. For further information, including a link to online documentation, please visit: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T05 Institutions that have membership in the LDC during the 2003 Membership Year will be able to receive this corpus free of charge. Nonmembers may license this publication for $2,500. * If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email to <ldcMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueldc.upenn.edu> or call (215) 573-1275. - ------------------------------------------------------------------- Linguistic Data Consortium Phone: (215) 573-1275 3600 Market Street Fax: (215) 573-2175 Suite 810 email: ldc
ldc.upenn.edu Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu