* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 20.1537

Wed Apr 22 2009

Qs: NYT Corpus - Call for Demos, Prototypes & Research

Editor for this issue: Catherine Adams <catherinlinguistlist.org>


We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Evan Sandhaus, NYT Corpus - Call for Demos, Prototypes & Research

Message 1: NYT Corpus - Call for Demos, Prototypes & Research
Date: 20-Apr-2009
From: Evan Sandhaus <sandesnytimes.com>
Subject: NYT Corpus - Call for Demos, Prototypes & Research
E-mail this message to a friend

Last October, The New York Times Company released The New York Times
Annotated Corpus to the computational linguistics community
(http://corpus.nytimes.com).

This June, I plan to present examples of research on this data as part of
the closing keynote at the 2009 Semantic Technologies Conference. If you
or your colleagues have done work with this data and you'd like us to
consider your research summary, prototype, or demo for mention in our talk
please write to me at sandhesnytimes.com. Also, I encourage you to share
your work with the community at http://corpus.nytimes.com.

A little bit of Background on The New York Times Annotated Corpus:

Available for noncommercial research license from The Linguistic Data
Consortium (LDC), the corpus spans 20 years of newspapers between 1987 and
2007 (that’s 7,475 issues, to be exact). This collection includes the text
of 1.8 million articles written at The Times. Of these, more than 1.5
million have been manually annotated by The New York Times Index with
distinct tags for people, places, topics and organizations drawn from a
controlled vocabulary. A further 650,000 articles also include summaries
written by indexers from the New York Times Index. The corpus is provided
as a collection of XML documents in the News Industry Text Format and
includes open source Java tools for parsing documents into memory resident
objects.

You can read more about the corpus at:

http://open.blogs.nytimes.com/2009/01/12/fatten-up-your-corpus/

All the best,

Evan Sandhaus
--
Semantic Technologist
Research & Development Operations
New York Times Company

Linguistic Field(s): Computational Linguistics
                            Lexicography
                            Semantics
                            Text/Corpus Linguistics

This Year the LINGUIST List hopes to raise $60,000. This money will go to help 
keep the List running by supporting all of our Student Editors for the coming year.

See below for donation instructions, and don't forget to check out our Fund Drive 
2009 LINGUIST List Restaurant and join us for a delightful treat!

http://linguistlist.org/fund-drive/2009/

There are many ways to donate to LINGUIST!

You can donate right now using our secure credit card form at  
https://linguistlist.org/donation/donate/donate1.cfm

Alternatively you can also pledge right now and pay later. To do so, go to:
https://linguistlist.org/donation/pledge/pledge1.cfm

For all information on donating and pledging, including information on how to 
donate by check, money order, or wire transfer, please visit:
http://linguistlist.org/donate.html

The LINGUIST List is under the umbrella of Eastern Michigan University and as such 
can receive donations through the EMU Foundation, which is a registered 501(c) Non 
Profit organization. Our Federal Tax number is 38-6005986. These donations can be 
offset against your federal and sometimes your state tax return (U.S. tax payers 
only). For more information visit the IRS Web-Site, or contact your financial advisor.

Many companies also offer a gift matching program, such that they will match any 
gift you make to a non-profit organization. Normally this entails your contacting 
your human resources department and sending us a form that the EMU Foundation fills 
in and returns to your employer. This is generally a simple administrative procedure 
that doubles the value of your gift to LINGUIST, without costing you an extra penny. 
Please take a moment to check if your company operates such a program.

Thank you very much for your support of LINGUIST!
-----------------------------------------------------------------------------------------

Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.