LINGUIST List 9.598

Wed Apr 22 1998

FYI: LDC Corpora, Lang Universals

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. LDC Office, New Corpora from the Linguistic Data Consortium
  2. LDC Office, New Corpora from the Linguistic Data Consortium
  3. Don Nilsen, Language Universals: Irony, Language Play, Metaphor, Metonymy

Message 1: New Corpora from the Linguistic Data Consortium

Date: Mon, 20 Apr 1998 16:43:14 EDT
From: LDC Office <ldcunagi.cis.upenn.edu>
Subject: New Corpora from the Linguistic Data Consortium




		Announcing NEW RELEASES from the
		 Linguistic Data Consortium

1996 Broadcast News Training Speech Data
1996 Broadcast News Dev. and Eval. Data
1996 Broadcast News Transcripts


The 1996 Broadcast News Speech Corpus contains a total of 104 hours of
broadcasts from ABC, CNN, and CSPAN television networks and NPR and
PRI radio networks with corresponding transcripts. The primary
motivation for this collection is to provide training data for the
DARPA "Hub-4" Project on continuous speech recognition in the
broadcast domain. The speech files are available in a 19 disc training
data set with one additional disc of development data and an
additional disc of evaluation data. The following programs are
represented in this corpus:

 ABC Nightline 
 ABC World Nightly News 
 ABC World News Tonight 
 CNN Early Edition 
 CNN Early Prime News 
 CNN Headline News 
 CNN Prime Time News 
 CNN The World Today 
 CSPAN Washington Journal 
 NPR All Things Considered 
 NPR Marketplace 

Transcripts have been made of all recordings in this publication,
manually time aligned to the phrasal level, annotated to identify
boundaries between news stories, speaker turn boundaries, and gender
information about the speakers. The released version of the
transcripts is in SGML format, and there is accompanying
documentation, and an SGML DTD file, included with the transcription
release. The transcripts are available via ftp.

Because of restrictions imposed by the copyright holders of the news
text, these corpora are available to 1997 and 1998 LDC members only.
Members who wish to receive these corpora MUST SIGN BOTH THE USC AND
THE NPR AGREEMENTS. These agreements are available on the Linguistic
Data Consortium WWW Home Page at URL

http://www.ldc.upenn.edu/ldc/catalog/index.html.


If you would like to order a copy of these corpora, please email your
request to <ldcunagi.cis.upenn.edu>. If you need additional
information before placing your order, or would like to inquire about
membership in the LDC, please send email or call (215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL:

http://www.ldc.upenn.edu/

Information is also available via ftp at ftp.cis.upenn.edu under
pub/ldc; for ftp access, please use "anonymous" as your login name,
and give your email address when asked for password.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: New Corpora from the Linguistic Data Consortium

Date: Mon, 20 Apr 1998 16:42:01 EDT
From: LDC Office <ldcunagi.cis.upenn.edu>
Subject: New Corpora from the Linguistic Data Consortium


		Announcing a NEW RELEASE from the
 LINGUISTIC DATA CONSORTIUM

			
COMLEX English Syntax Lexicon, Version 3.0


This is a moderately broad coverage English lexicon (with about 38,000
lemmas) developed at New York University under LDC sponsorship. It
contains detailed information about the syntactic characteristics of
each lexical item, and is particularly detailed in its treatment of
subcategorization (complement structures).

In the current dictionary, nouns have 9 possible features and 9
possible complements; adjectives have 7 features and 14 complements;
verbs have 5 features and 92 complements; and adverbs have 11
positional classes and 12 features. The entries for 750 frequent verbs
contain 100 tags each, where a tag includes: a pointer to an instance
of that verb in a corpus and the subcategorization appropriate for
that instance.

This latest version of COMLEX Syntax has been updated to include the
adverb classes. We also added diacritics to foreign words, while
retaining the unaccented versions and performed various other updates
to correct and supplement our lexical entries. For more details about
this revised version, please contact Adam Meyers at New York
University (meyerscs.nyu.edu).

This release is accompanied by the COMLEX Syntax Text Corpus, Version
2.0. The Text corpus consists of material from the following sources:

The Brown Corpus, Francis, W. Nelson, 1964 Brown University,
Providence

Wall Street Journal Material, Copyright 1989 Dow
Jones, Inc. 

San Jose Mercury News, Copyright 1991 San Jose Mercury News 

Associated Press, Copyright 1988 

Federal Register materials courtesy of IBM; formatted version
copyright 1992, University of Pennsylvania

Computer Library materials copyright owned by Ziff Communications
Company and other parties as their respective interests may appear.

Institutions that have membership in the LDC during the 1998
Membership Year will be able to receive COMLEX Syntax Lexicon 3.0 at
no additional charge, in the same manner as all other text and speech
corpora published by the LDC. Members who wish to receive this corpus
must sign the COMLEX user agreement. This agreement is available on
the Linguistic Data Consortium WWW Home Page at URL
http://www.ldc.upenn.edu/ldc/catalog/index.html.

Nonmembers can receive a copy of COMLEX Syntax Lexicon 3.0 for
research purposes only for a fee of $1500. If you would like to order
a copy of this corpus, please email your request to
ldcunagi.cis.upenn.edu. If you need additional information before
placing your order, or would like to inquire about membership in the
LDC, please send email or call (215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.ldc.upenn.edu/. Information is also available via ftp at
ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Language Universals: Irony, Language Play, Metaphor, Metonymy

Date: Mon, 20 Apr 1998 10:00:44 -0700 (MST)
From: Don Nilsen <don.nilsenasu.edu>
Subject: Language Universals: Irony, Language Play, Metaphor, Metonymy

 In response to Arthur Merin's query on "Verbal Irony as a
Language Universal," I have evidence suggesting that it might be, and
even more evidence suggesting that Language Play, Metaphor, and
Metonymy are language universals. I have bibliographies relating to
these areas for anyone out there who is interested in the current
research.

Don L. F. Nilsen 8-)
<don.nilsenasu.edu> (602) 965-7592; FAX: (602) 965-3451
Executive Secretary
International Society for Humor Studies
English Department
Arizona State University
Tempe, AZ 85287-0302
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue