LINGUIST List 6.519

Thu 06 Apr 1995

Sum: English compound noun corpora

Editor for this issue: <>


Directory

  1. Cecile Fabre, Sum : English Compound Noun Corpora

Message 1: Sum : English Compound Noun Corpora

Date: Tue, 4 Apr 1995 16:21:15 +Sum : English Compound Noun Corpora
From: Cecile Fabre <Cecile.Fabreirisa.fr>
Subject: Sum : English Compound Noun Corpora

Content-Length: 3382

One month ago I sent a query to obtain English noun compound corpora.
These are the two largest lists I received :

1. a 1-MB word list of compounds from a spellchecker for the NeXt
computer, sent by George Fowler.

2. a 9000 binary nominals sent by Richard Sproat with judgments on
accent placement. It is described in :

Richard Sproat, ``English Noun-Phrase Accent Prediction for
 Text-to-Speech.'' {\it Computer Speech and Language}, 1994, 8,
 79--94.

The 2 files are available by anonymous FTP from the following site :

ftp.irisa.fr under the directory /local/corpus

Other responses were mainly advice to build my own list
from tagged corpora (Brown Corpus, Penn Treebank, etc.) or by
statistical methods (see Johansson, C., 1994, Catching the Cheshire
Cat, proc. COLING, Kyoto, /http://www.ling.lu.se).

I received also some biliographical references on the treatment of
complex nominal sequences, which I reproduce below.

Thanks to : Eric Steven Atwell, Paul Bennett, Pier Marco Bertinetto,
Beatrice Daille, George Fowler, Christer Johansson, Bernie Jones, Mark
Lauer, Judith N. Levi, Philip Resnik, Richard Sproat, Achim Stein,
Wilco Ter Stal, Evelyne Tzoukermann, Nick Youd.

Bibliographical references :

Paul Bennett, A Multilingual Translation-oriented Typology of Compound
Nouns, TAL (Traitement Automatique du Langage), 1993, vol.34.

Church and Hanks, article in Computational Linguistics 16

Bernie Jones "Predicting Nominal Compounds", MPhil Dissertation,
University of Cambridge Engineering Department

Lauer, Mark (1994) "Conceptual Association for Compound Noun Analysis"
Proceedings of the Student Session of the 32nd Annual Meeting of the
Association for Computational Linguistics, June, Las Cruces, New Mexico

Lauer, Mark and Dras, Mark (1994) "A Probabilistic Model of Compound
Nouns" Proceedings of the 7th Australian Joint Conference on Artificial
Intelligence, November, Armidale, Australia

Levi, Judith N. 1978. THE SYNTAX AND SEMANTICS OF COMPLEX NOMINALS.
NY: Academic Press.
 Includes an appendix of compound forms.

Leonard, Rosemary. 1984. THE INTERPRETATION OF ENGLISH NOUN SEQUENCES
ON THE COMPUTER. Amsterdam: North-Holland
 This study used 2000 noun sequences taken from a corpus of
 300,000 words of English fiction from 1700 to now.

Ryder, Mary Ellen. 1994. ORDERED CHAOS: THE INTERPRETATION OF
ENGLISH NOUN-NOUN COMPOUNDS. Berkeley/Los Angeles/ London: University
of California Press.
 Focuses esp. on interpretation of novel pairings.

Rivista di Linguista, 4,1, 1992

Wilco G. ter Stal & Paul E. van der Vet, Two-level semantic analysis of
compounds
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue