LINGUIST List 9.1330

Fri Sep 25 1998

Sum: Spanish Corpora

Editor for this issue: Martin Jacobsen <>


  1. Eva Remberger, Spanish Corpora

Message 1: Spanish Corpora

Date: Fri, 25 Sep 1998 15:03:04 +0200 (MET DST)
From: Eva Remberger <>
Subject: Spanish Corpora

Dear list members,

here are the results and a list of people who were so helpful to send
me suggestions and hints concerning my question posted to the linguist
list on friday 18th september.

My question was as follows:
>Dear list members,
>it's a while I'm looking for Spanish Corpora of business
>Spanish. Does anybody know if there are Spanish Newspapers on CD-ROM
>(eg. all the issues of one year as it is possible for the german
>newspaper Sueddeutsche Zeitung)? I tried to contact EL PAIS but never
>received an answer.
>Actually, I would be interested in any kind of Corpus of contemporary
>Spanish (mainly european), - to buy or not to buy - but 'economia'-
>arguments would be even greater.
>Thank you for an answer. Of course, I will post a message with the


I want to thank:

Andreas Eisele
Antoine Consigny
Valerie Mapelli
Iain Downs
Purificacion Fdez-Nistal
Susana Sotelo Docio 
Eva Easton
Leonel Ruiz Miyares
Jos Luis Sancho
Raphael Salkie
Rene' Schneider

The summary of the results:

Among the commercial corpora there is ELRA

they have an Multilingual corpus (MLCC) consisting of 6 European
financial newspapers (Het Financieele Dagblad, Handelsblatt, Financial
Times, Le Monde, Il Sole 24 Ore, Expansion); the spanish subcorpus
(Expansion) has about 10 million words (21.10.1991-24.10.91 and
14.5.94-27.12.94). The entire corpus is available via ELRA at the
following costs:

- For ELRA members for research use: 360 ECU 
- For non members for research use: 750 ECU

- --------------------------------------------------------------------
Another commercial publisher of research material and a provider of
newspapers on CD-ROM is Newsbanks: They offer Noticias en Espanol on
monthly CD-ROMs:
- ---------------------------------------------------------------------
Yet another commercial service is ProQuest; they seem to have EL Norte
and Reforma (Mexico)
- -----------------------------------------------------------------
There must be a CD-ROM edition of the 1994 volume of El Mundo (in to
disks); the text is in ASCII format and classified in categories
(economy, national, etc); I'm not sure if it is still available.
- ------------------------------------------------------------------
There is a link collection to Spanish online-newspapers at:
- ----------------------------------------------------------------------
There is a website about corpora-FAQs of the Language technology group
(the interesting one is the tool section I guess):
- ---------------------------------------------------------------------
El Observatorio Espaol de Industrias de la Lengua, could be
interesting; it also has some more links: (click on
recursos linguisticos)
- --------------------------------------------------------------------
There a several corpora available at the Department of Romance
Languages of the University of Goeteborg (Banco de datos de Prensa
Espanola 1977, Banco de Datos de Once Novelas Espanolas 1951-1971, A
Concordance based on the Corpus oral the referencia del Espanol
- ---------------------------------------------------------------------
Professor Barry Ife, at School of Humanities, King's College / London
is reffered to be compiling a large corpus of modern Spanish.
- ------------------------------------------------------------------------
Spanisch newspaper corpus that consists of 200 newspaper texts of
latinamerican newspapers on CD-ROM (Tiff and a ASCII Version). The
corpus includes 39.081 tokens and is available (to buy) at the
Information Science Research Institute / University of Nevada at Las
4505 Maryland Parkway
Las Vegas, Nevada 89154-4201
For information contact ISRI by
Phone: +1 702 895 - 3338
Fax: +1 702 895 -1560
- -------------------------------------------------------------------
At the University of Murcia there is the CUMBRE Corpus: Contact Prof.
Aquilino Sanchez:
- -------------------------------------------------------------
The CRATER corpus consists of morphosyntactically tagged
- ---------------------------------------------------------------
Dr. Purificacion Fdez.- Nistal and the Instituto de Terminologia
Bilingue y Traduccion Especializada (ITBYTE) at the Universidad de
Valladolid/Spain are in the process of building their own corpus.
- ---------------------------------------------------------------
Ing. Leonel Ruiz Miyares (Director of Applied Linguistics Centre /
Santiago de Cuba) keeps a Spanish-corpus of children's vocabulary
(by the way, there is a European Spanish Corpus of child language, the
- -------------------------------------------------------------
The Lingua project (EU-funded project on multilingual concordancing:
 but as far they have only English, French, German, Italian, Greek,
Danish texts - they are considering bringing in Spanish and
- ------------------------------------------------------

		Thanks a lot 		Eva Remberger

			Sprachliche Informationsverarbeitung
Eva Maria Remberger	Philosophische Fakultaet
			Universitaet zu Koeln
			D-50923 Koeln
- ---------------------------------------------------------------
	Visit our web-site at:
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue