LINGUIST List 4.314

Tue 27 Apr 1993

Sum: Spanish corpora

Editor for this issue: <>


  1. Gabriel Decio, summary--Spanish corpora

Message 1: summary--Spanish corpora

Date: Mon, 26 Apr 93 23:44:16 ESsummary--Spanish corpora
From: Gabriel Decio <>
Subject: summary--Spanish corpora

Thanks to all that responded to my query on Spanish corpora available
online. Below is a summary of the responses I got.


********************* Text Corpora List: Addresses ***************************
CORPORANORA.HD.UIB.NO for messages to the list
CORPORA-REQUESTNORA.HD.UIB.NO for messages to list administrator
FILESERVNORA.HD.UIB.NO for requests to file server (try sending HELP)

I'm looking for online Spanish corpora, preferably newspaper or
magazine articles. I've heard there is a collection at the University
of Miami, but I haven't been able to find it. Can anyone help he out?
BTW, I already know what is available in the Oxford Text Archive.

 Doug McKee E-mail:
 SRA Corp. Phone: (703) 558-7820
 2000 15th St. N Fax: (703) 558-4723
 Arlington, VA 22201

I would like to mention the Catalogue of Projects in Electronic Text (CPET)
at Georgetown University, Washington DC. This catalogue can be accessed
via Telnet to: with username: CPET (you will need
VT-100 keys).

A manual can be fetched from our fileserver (FILESERVNORA.HD.UIB.NO)
by sending

send info cpet.manual

either as the subject or the only line in the message.

A list of roman language projects (of feb. 1991, 64 KB) can be
fetched from the file server with the line:

send info roman.projects

For further information about CPET, contact
Margaret Friedman (


There is a swedish archive at Gothenburg University containing spanish
newspaper and magazine articles. Please contact:
 David Mighetto <>


Concerning English corpora, I'd like to mention that I wrote a survey
of electronic corpora and related resources which will be published in
the book "Talking Data: Transcription and coding in discourse research",
Edwards & Lampert, Erlbaum Publishers, due out April 15.

Other surveys are available through:
the ICAME archive (anonymous ftp to, and
CPET (cited in the preceding message).
There is also the Oxford Text Archive, which specializes, however, in
literature and Biblical texts: anonymous ftp to

Hope that helps.


There are some
literary works available electronically from Project Gutenberg. You can get
them via anonymous ftp. Just ftp to , after entering then "cd
etext/etext92" or "etext91" or "etext93". Among their offerings are works like
"Moby Dick" and "Through the Looking Glass". I think they even have Clinton's
Inaugural address.

I've also been looking for e-texts in Spanish, but with not too much luck. I
have some newspaper articles, and some interviews ews that someowas kind
enough to send me once. (I posted a query on Linguist about Spanish corpora a
while back)


 #12755) id <>; Tue,
 16 Mar 1993 18:43 EST
There are zillions of e-texts! Here are a few sources.
1. The Oxford Text Archives: I can send you their catalogue and order
 form. They have *lots* of texts in several languages. They will FTP
 the texts to you free over the internet.
2. Georgetown Catalogue of Projects in Electronic Text (CPET): there was
 a posting on LINGUIST not too long ago ... if you have access to Gopher,
 you can find it under 'North America', 'Washington DC'.
3. Commercial: in catalogues such as MacWarehouse, you can find CD-ROMS
 of text like 'Front Page News'.
4. ACL/DCI: they have a CD-ROM with over a million words of Dow Jones
 or the Wall Street Journal (or both? I forget)
5. The Linguistic Data Consortium (LDC): lots of non-literary e-corpora,
 including transcriptions of spoken data
6. ICAME: they have a CD-ROM of famous e-corpora + tools (concordances
 and stuff) that goes for about $500, and includes the Brown corpus,
 the LOB corpus, the Lundon-Lund corpus, the Helsinki Diachronic
 corpus (see 'corpus' and these entries in the Oxford Companion to
 the English Language)
7. The CHILDES database - caretaker and child language in several diff.

End of text============================================================

|Gabriel A. Decio | XX XXX XXX XXX XX |
|Dept. of English | XX XX XX XX XX XX XX XX |
|Purdue University | XX XX XX XX XX XX XX XX |
|West Lafayette, IN | XXX XXX XXX XXX |
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue