Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


Oxford Handbook of Corpus Phonology

Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

Offers the first detailed examination of corpus phonology and serves as a practical guide for researchers interested in compiling or using phonological corpora

New from Cambridge University Press!


The Languages of the Jews: A Sociolinguistic History

By Bernard Spolsky

A vivid commentary on Jewish survival and Jewish speech communities that will be enjoyed by the general reader, and is essential reading for students and researchers interested in the study of Middle Eastern languages, Jewish studies, and sociolinguistics.

New from Brill!


Indo-European Linguistics

New Open Access journal on Indo-European Linguistics is now available!

Query Details

Query Subject:   Genre-Specific Corpora
Author:   Marina Santini
Submitter Email:  click here to access email

Linguistic LingField(s):  Computational Linguistics
Text/Corpus Linguistics

Query:   Hi,

I am doing some research in concept extraction from different types of
texts or genres.

I am looking for free research corpora (in English and in any other
language) belonging to the following genres:

1) FAQs (I have already downloaded some small collections, but I
would like to have a more comprehensive range of topics).
2) Chat logs transcripts (I have already downloaded the NPS
Collection, 3 Codiac datasets and several smallish Many Eyes
3) Telephone conversation transcripts (missing)
4) Emails (I have already downloaded the Enron dataset and a couple
of junk mail collections)
5) Twitter posts corpora (missing, apparently the Edinburgh's Twitter
corpus is not available any more)
6) Corporate weblog corpora (missing)

I will be glad to share all the links and related documentation, once I got
all the genres in the list.

Thanks in advance for your suggestions.

Best Regards,

Marina Santini
Researcher at Artificial Solutions
LL Issue: 22.1852
Date posted: 26-Apr-2011


Sums main page