* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 22.1852

Tue Apr 26 2011

Qs: Genre-Specific Corpora

Editor for this issue: Danielle St. Jean <daniellelinguistlist.org>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
        1.     Marina Santini , Genre-Specific Corpora

Message 1: Genre-Specific Corpora
Date: 26-Apr-2011
From: Marina Santini <MarinaSantini.MSgmail.com>
Subject: Genre-Specific Corpora
E-mail this message to a friend


I am doing some research in concept extraction from different types of
texts or genres.

I am looking for free research corpora (in English and in any other
language) belonging to the following genres:

1) FAQs (I have already downloaded some small collections, but I
would like to have a more comprehensive range of topics).
2) Chat logs transcripts (I have already downloaded the NPS
Collection, 3 Codiac datasets and several smallish Many Eyes
3) Telephone conversation transcripts (missing)
4) Emails (I have already downloaded the Enron dataset and a couple
of junk mail collections)
5) Twitter posts corpora (missing, apparently the Edinburgh's Twitter
corpus is not available any more)
6) Corporate weblog corpora (missing)

I will be glad to share all the links and related documentation, once I got
all the genres in the list.

Thanks in advance for your suggestions.

Best Regards,

Marina Santini
Researcher at Artificial Solutions

Linguistic Field(s): Computational Linguistics
                            Text/Corpus Linguistics

This Year the LINGUIST List hopes to raise $67,000. This money will go to help 
keep the List running by supporting all of our Student Editors for the coming year.

See below for donation instructions, and don't forget to check out Fund 
Drive 2011 site!


There are many ways to donate to LINGUIST!

You can donate right now using our secure credit card form at  

Alternatively you can also pledge right now and pay later. To do so, go to: 

For all information on donating and pledging, including information on how to 
donate by check, money order, or wire transfer, please visit: 

The LINGUIST List is under the umbrella of Eastern Michigan University and as 
such can receive donations through the EMU Foundation, which is a registered 
501(c) Non Profit organization. Our Federal Tax number is 38-6005986. These 
donations can be offset against your federal and sometimes your state tax return 
(U.S. tax payers only). For more information visit the IRS Web-Site, or contact 
your financial advisor.

Many companies also offer a gift matching program, such that they will match 
any gift you make to a non-profit organization. Normally this entails your 
contacting your human resources department and sending us a form that the 
EMU Foundation fills in and returns to your employer. This is generally a simple 
administrative procedure that doubles the value of your gift to LINGUIST, without 
costing you an extra penny. Please take a moment to check if your company 
operates such a program.

Thank you very much for your support of LINGUIST!

Read more issues|LINGUIST home page|Top of issue

Page Updated: 26-Apr-2011

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.