Mon 19 Aug 1991

Disc: Searches for Data Corpora

  1. Becky Passonneau, Search for Data Corpus
  2. Susan Ervin-Tripp, Re: Responses: Brown corpus, being

Message 1: Search for Data Corpus

Date: Fri, 16 Aug 91 17:29:57 EDT
From: Becky Passonneau <becky%division.cs.columbia.eduRICEVM1.RICE.EDU>
Subject: Search for Data Corpus
Two collaborators and I are planning to analyze a variety of aspects
of English conversational discourse, and are searching for an
appropriate corpus. Since there seems to be a growing trend to
collect and share such data, I thought I'd ask the linguist
subscribers for suggestions as to where we might find a pre-existing
corpus. If anyone would be interested in a trade, e.g., to enlarge
their own corpus of conversational data, I have a set of transcribed
interviews that turn out not to be ideal for the current study, but
which I have distributed in the past and would certainly share again.
The ideal corpus for my current purposes would have the following
	naturalistic English discourse (e.g., not a restricted sub-language
		like sports or news journalism)
	oral, i.e., w/ audio or audio/video recordings
	accurately transcribed in some accessible transcription method
	machine readable format
	short, manageable discourses (e.g., 10-20 minutes)
	reasonable number of discourses to generalize from
		(e.g., a half dozen? a dozen?)
	preferably monologic, or if not, then as much like monologue
		as possible, e.g., dominated by one speaker; limited
		amount of 'meta-level' discussion (e.g., clarification
		dialogues, conversational repairs, etc.)
	independently motivated or transparent hierarchical action structure
		underlying the discourse (e.g., for establishing segment
		boundaries, for doing plan inference)
	significant role of temporal information (e.g., because of
		domain or genre or task structure) resulting in
		high frequency of temporal adverbial phrases or
		meaningful shifts and continuations of tense;
		variations in lexical and grammatical aspectual types
	possibly analyzed already w/ respect to prosodic cues;
		segmental structure; anything else that could provide
		a basis for drawing generalizations about
		distributional patterns
Since this is my first posting to 'linguist', I'll use this opportunity
to thank the moderators for their remarkable efforts, and the subscribers
for their interesting discussions.
Becky Passonneau
Message 2: Re: Responses: Brown corpus, being

Date: Sat, 17 Aug 91 13:41:32 -0700
From: Susan Ervin-Tripp <ervin-trcogsci.Berkeley.EDU>
Subject: Re: Responses: Brown corpus, being
tHe examples from Harris in Nevin's letter seem a peculiar argument.
Something is missing.
	An interesting example from Harris's _A Grammar of English on
	Mathematical Principles_:
	 The uncomfortableness of -ing on adjectives leads to
	 occasional elisions of it: in _Don't be horrid. I'm not
	 being horrid_ the retort shows that the first sentence
	 can be taken as reduced from !Don't be being horrid.
Children in role play:
 A: I'm being the mommy
 B: Don't be the mommy, I'm gonna be the mommy.
 A: Well, I'm washing the dishes.
 B: No, don't wash the dishes.
 A: I'm being nice to the baby.
 B: Don't be nice to the baby. I'm the mommy.
This is an invented example, but the A turns at least are
consistent with the genre. The
convenience for discussing use of -ing is that children often
constitute roles by identifying what they are doing explicitly this
way by the use of -ing.
Could somebody explain why, given the parallelismss in these examples,
there is some elision in Don't be horrid?
