LINGUIST List 9.1330

Fri Sep 25 1998

Sum: Spanish Corpora

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. Eva Remberger, Spanish Corpora

Message 1: Spanish Corpora

Date: Fri, 25 Sep 1998 15:03:04 +0200 (MET DST)
From: Eva Remberger <erembergspinfo.uni-koeln.de>
Subject: Spanish Corpora

Dear list members,

here are the results and a list of people who were so helpful to send
me suggestions and hints concerning my question posted to the linguist
list on friday 18th september.

My question was as follows:
>Dear list members,
>
>it's a while I'm looking for Spanish Corpora of business
>Spanish. Does anybody know if there are Spanish Newspapers on CD-ROM
>(eg. all the issues of one year as it is possible for the german
>newspaper Sueddeutsche Zeitung)? I tried to contact EL PAIS but never
>received an answer.
>
>Actually, I would be interested in any kind of Corpus of contemporary
>Spanish (mainly european), - to buy or not to buy - but 'economia'-
>arguments would be even greater.
>
>Thank you for an answer. Of course, I will post a message with the
>results.

______________________________________________________________________

I want to thank:

Andreas Eisele
Antoine Consigny
Valerie Mapelli
Iain Downs
Purificacion Fdez-Nistal
Susana Sotelo Docio 
Eva Easton
Leonel Ruiz Miyares
Jos Luis Sancho
M.M.W.Pollmann
Raphael Salkie
Rene' Schneider
______________________________________________________________________

The summary of the results:
_______________________________________________________________________

Among the commercial corpora there is ELRA 
http://www.icp.grenet.fr/ELRA/cata/tabtext.html

they have an Multilingual corpus (MLCC) consisting of 6 European
financial newspapers (Het Financieele Dagblad, Handelsblatt, Financial
Times, Le Monde, Il Sole 24 Ore, Expansion); the spanish subcorpus
(Expansion) has about 10 million words (21.10.1991-24.10.91 and
14.5.94-27.12.94). The entire corpus is available via ELRA at the
following costs:

- For ELRA members for research use: 360 ECU 
- For non members for research use: 750 ECU

- --------------------------------------------------------------------
Another commercial publisher of research material and a provider of
newspapers on CD-ROM is Newsbanks: They offer Noticias en Espanol on
monthly CD-ROMs:
http://www.newsbank.com/schools/high/spanish.html
- ---------------------------------------------------------------------
Yet another commercial service is ProQuest; they seem to have EL Norte
and Reforma (Mexico)
http://www.umi.com/hp/WhatWeDo.html
- -----------------------------------------------------------------
There must be a CD-ROM edition of the 1994 volume of El Mundo (in to
disks); the text is in ASCII format and classified in categories
(economy, national, etc); I'm not sure if it is still available.
- ------------------------------------------------------------------
There is a link collection to Spanish online-newspapers at:
http://www.newslink.org/euspan.html
- ----------------------------------------------------------------------
There is a website about corpora-FAQs of the Language technology group
(the interesting one is the tool section I guess):
http://www.ltg.ed.ac.uk/helpdesk/faq/index.html#Texts0040
- ---------------------------------------------------------------------
El Observatorio Espaol de Industrias de la Lengua, could be
interesting; it also has some more links:
http://www.cervantes.es/internet/acad/oeil/mar_oeil.htm (click on
recursos linguisticos)
- --------------------------------------------------------------------
There a several corpora available at the Department of Romance
Languages of the University of Goeteborg (Banco de datos de Prensa
Espanola 1977, Banco de Datos de Once Novelas Espanolas 1951-1971, A
Concordance based on the Corpus oral the referencia del Espanol
contemporaneo.) http://rom.gu.se/~romgb/Corpora.html
- ---------------------------------------------------------------------
Professor Barry Ife, at School of Humanities, King's College / London
is reffered to be compiling a large corpus of modern Spanish.
barry.ifekcl.ac.uk
- ------------------------------------------------------------------------
Spanisch newspaper corpus that consists of 200 newspaper texts of
latinamerican newspapers on CD-ROM (Tiff and a ASCII Version). The
corpus includes 39.081 tokens and is available (to buy) at the
Information Science Research Institute / University of Nevada at Las
Vegas
4505 Maryland Parkway
Las Vegas, Nevada 89154-4201
For information contact ISRI by
Phone: +1 702 895 - 3338
Fax: +1 702 895 -1560
E-mail: isri-infoisri.unlv.edu
- -------------------------------------------------------------------
At the University of Murcia there is the CUMBRE Corpus: Contact Prof.
Aquilino Sanchez: asanchezfcu.um.es
- -------------------------------------------------------------
The CRATER corpus consists of morphosyntactically tagged
communication: ftp.ling.lancs.ac.uk
- ---------------------------------------------------------------
Dr. Purificacion Fdez.- Nistal and the Instituto de Terminologia
Bilingue y Traduccion Especializada (ITBYTE) at the Universidad de
Valladolid/Spain are in the process of building their own corpus.
- ---------------------------------------------------------------
Ing. Leonel Ruiz Miyares (Director of Applied Linguistics Centre /
Santiago de Cuba) keeps a Spanish-corpus of children's vocabulary
(by the way, there is a European Spanish Corpus of child language, the
MARIA-Corpus: http://www.sis.ucm.es/Spanish/)
- -------------------------------------------------------------
The Lingua project (EU-funded project on multilingual concordancing:
 but as far they have only English, French, German, Italian, Greek,
Danish texts - they are considering bringing in Spanish and
portoghese: http://www.loria.fr/equipes/dialogue/lingua
- ------------------------------------------------------

		Thanks a lot 		Eva Remberger


- 
_________________________________________________________________
			Sprachliche Informationsverarbeitung
Eva Maria Remberger	Philosophische Fakultaet
			Universitaet zu Koeln
			Albertus-Magnus-Platz
			D-50923 Koeln
- ---------------------------------------------------------------
	Visit our web-site at: http://www.spinfo.uni-koeln.de
________________________________________________________________
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue