LINGUIST List 16.414

Thu Feb 10 2005

Qs: Internet Chat Corpora; Typology Island Constraints

Editor for this issue: Steven Moran <stevelinguistlist.org>


We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.

Directory

        1.    Stuart McCaul, Internet Chat Corpora
        2.    Inbal Arnon, The Typology of Island Constraints


Message 1: Internet Chat Corpora

Date: 09-Feb-2005
From: Stuart McCaul <mccaulstcd.ie>
Subject: Internet Chat Corpora


My project is to implement an Internet chatroom text-filter based on
Bayesian maths, which will categorise chat-room conversations. The filter
must be trained on pre-categorised text and I am having trouble finding a
categorised corpus of chat-room conversations.

Do you know of such a corpus? Or do you have any other advice for me?

In the event that such a collection of conversations is not available, I
will use the Reuters corpus or similar collection to train my filter.

I am a 4th year undergraduate student at Trinity College, Dublin, studying
for a bachelor's degree in Information and Communications Technology.

Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics

Subject Language(s): English (ENG)

Message 2: The Typology of Island Constraints

Date: 09-Feb-2005
From: Inbal Arnon <inbalarstanford.edu>
Subject: The Typology of Island Constraints



I am part of a research group that is investigating island constraints and
their relation to processing difficulty. We are currently gathering
information about cross-linguistic differences in island constraints and
the effects of finiteness, lexical choice (e.g. bridge verbs), etc. on
extraction difficulty, e.g. contrasts like: ( >/ = `at least as acceptable as')

Which book were they wondering whether or not to read? >/
Which book were they wondering whether or not they should read? >/
Which book were they wondering whether or not he had read?

Which symphony did Schubert die before finishing? >
Which symphony did Schubert die before he finished?

We'd welcome pointers to relevant data from any language or literature
discussing cross-linguistic differences in island phenomena. We're
interested in cross-linguistic variation of all sorts, including subjacency
or superiority effects, violations of the coordinate structure constraint, left
branch condition, etc.

Linguistic Field(s): Syntax
Typology

Respond to list|Read more issues|LINGUIST home page|Top of issue