LINGUIST List 16.414

Thu Feb 10 2005

Qs: Internet Chat Corpora; Typology Island Constraints

Editor for this issue: Steven Moran <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at


        1.    Stuart McCaul, Internet Chat Corpora
        2.    Inbal Arnon, The Typology of Island Constraints

Message 1: Internet Chat Corpora

Date: 09-Feb-2005
From: Stuart McCaul <>
Subject: Internet Chat Corpora

My project is to implement an Internet chatroom text-filter based on
Bayesian maths, which will categorise chat-room conversations. The filter
must be trained on pre-categorised text and I am having trouble finding a
categorised corpus of chat-room conversations.

Do you know of such a corpus? Or do you have any other advice for me?

In the event that such a collection of conversations is not available, I
will use the Reuters corpus or similar collection to train my filter.

I am a 4th year undergraduate student at Trinity College, Dublin, studying
for a bachelor's degree in Information and Communications Technology.

Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics

Subject Language(s): English (ENG)

Message 2: The Typology of Island Constraints

Date: 09-Feb-2005
From: Inbal Arnon <>
Subject: The Typology of Island Constraints

I am part of a research group that is investigating island constraints and
their relation to processing difficulty. We are currently gathering
information about cross-linguistic differences in island constraints and
the effects of finiteness, lexical choice (e.g. bridge verbs), etc. on
extraction difficulty, e.g. contrasts like: ( >/ = `at least as acceptable as')

Which book were they wondering whether or not to read? >/
Which book were they wondering whether or not they should read? >/
Which book were they wondering whether or not he had read?

Which symphony did Schubert die before finishing? >
Which symphony did Schubert die before he finished?

We'd welcome pointers to relevant data from any language or literature
discussing cross-linguistic differences in island phenomena. We're
interested in cross-linguistic variation of all sorts, including subjacency
or superiority effects, violations of the coordinate structure constraint, left
branch condition, etc.

Linguistic Field(s): Syntax

Respond to list|Read more issues|LINGUIST home page|Top of issue