LINGUIST List 15.1894

Tue Jun 22 2004

Qs: Basic English Corpus;American First Name Corpus

Editor for this issue: Naomi Fox <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at


  1. Sandra Williams, Seeking "readable" English RST corpus
  2. Charlotte Russell, Transcribed corpus of first name

Message 1: Seeking "readable" English RST corpus

Date: Mon, 21 Jun 2004 07:03:12 -0400 (EDT)
From: Sandra Williams <>
Subject: Seeking "readable" English RST corpus

Dear Linguist List,

I am working on a project involving readability and Natural Language
Generation. Specifically, I am investigating how discourse-level
choices affect reading ease of the generated output. In previous work,
we analysed the RST Discourse Treebank Corpus (purchased from the LDC)
to acquire knowledge about how human authors make discourse-level
choices. The biggest problem was that the corpus contained Wall Street
Journal Articles which are not generally very easy to read and this
corpus was not therefore not very suitable for our purposes.

We are now searching for a corpus that is annotated with discourse
relations, similar to the RST Discourse Treebank Corpus, but
containing texts that are easier to read. The corpus must contain
English texts. The texts in the corpus could be written for children,
or they could be easier texts written for adults. The texts must be
annotated with discourse relations, preferably using RST. Ideally, the
corpus should be machine-readable.

If you know of any such corpus, or similar, that is available for
research purposes, please let me know. I will summarise any useful
answers I receive for the benefit of others in the list.

Many thanks,

Sandra Williams
University of Aberdeen 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Transcribed corpus of first name

Date: Mon, 21 Jun 2004 15:22:20 -0400 (EDT)
From: Charlotte Russell <>
Subject: Transcribed corpus of first name

Looking for a transcribed corpus of American first names for testing
against a commercial application. Can anyone point me in the right
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue