Editor for this issue: Naomi Fox <fox
linguistlist.org>
Dear Linguist List, I am working on a project involving readability and Natural Language Generation. Specifically, I am investigating how discourse-level choices affect reading ease of the generated output. In previous work, we analysed the RST Discourse Treebank Corpus (purchased from the LDC) to acquire knowledge about how human authors make discourse-level choices. The biggest problem was that the corpus contained Wall Street Journal Articles which are not generally very easy to read and this corpus was not therefore not very suitable for our purposes. We are now searching for a corpus that is annotated with discourse relations, similar to the RST Discourse Treebank Corpus, but containing texts that are easier to read. The corpus must contain English texts. The texts in the corpus could be written for children, or they could be easier texts written for adults. The texts must be annotated with discourse relations, preferably using RST. Ideally, the corpus should be machine-readable. If you know of any such corpus, or similar, that is available for research purposes, please let me know. I will summarise any useful answers I receive for the benefit of others in the list. Many thanks, Sandra Williams University of AberdeenMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
Looking for a transcribed corpus of American first names for testing against a commercial application. Can anyone point me in the right direction? CharlotteMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue