Fri Sep 25 1998

Sum: Spanish Corpora

    Date: Fri, 25 Sep 1998 15:03:04 +0200 (MET DST)
    From: Eva Remberger <>
    Subject: Spanish Corpora

    Dear list members,

    here are the results and a list of people who were so helpful to send me suggestions and hints concerning my question posted to the linguist list on friday 18th september.

    My question was as follows: >Dear list members, > >it's a while I'm looking for Spanish Corpora of business >Spanish. Does anybody know if there are Spanish Newspapers on CD-ROM >(eg. all the issues of one year as it is possible for the german >newspaper Sueddeutsche Zeitung)? I tried to contact EL PAIS but never >received an answer. > >Actually, I would be interested in any kind of Corpus of contemporary >Spanish (mainly european), - to buy or not to buy - but 'economia'- >arguments would be even greater. > >Thank you for an answer. Of course, I will post a message with the >results.


    I want to thank:

    Andreas Eisele Antoine Consigny Valerie Mapelli Iain Downs Purificacion Fdez-Nistal Susana Sotelo Docio Eva Easton Leonel Ruiz Miyares Jos Luis Sancho M.M.W.Pollmann Raphael Salkie Rene' Schneider ______________________________________________________________________

    The summary of the results: _______________________________________________________________________

    Among the commercial corpora there is ELRA

    they have an Multilingual corpus (MLCC) consisting of 6 European financial newspapers (Het Financieele Dagblad, Handelsblatt, Financial Times, Le Monde, Il Sole 24 Ore, Expansion); the spanish subcorpus (Expansion) has about 10 million words (21.10.1991-24.10.91 and 14.5.94-27.12.94). The entire corpus is available via ELRA at the following costs:

    - For ELRA members for research use: 360 ECU - For non members for research use: 750 ECU

    - -------------------------------------------------------------------- Another commercial publisher of research material and a provider of newspapers on CD-ROM is Newsbanks: They offer Noticias en Espanol on monthly CD-ROMs: - --------------------------------------------------------------------- Yet another commercial service is ProQuest; they seem to have EL Norte and Reforma (Mexico) - ----------------------------------------------------------------- There must be a CD-ROM edition of the 1994 volume of El Mundo (in to disks); the text is in ASCII format and classified in categories (economy, national, etc); I'm not sure if it is still available. - ------------------------------------------------------------------ There is a link collection to Spanish online-newspapers at: - ---------------------------------------------------------------------- There is a website about corpora-FAQs of the Language technology group (the interesting one is the tool section I guess): - --------------------------------------------------------------------- El Observatorio Espaol de Industrias de la Lengua, could be interesting; it also has some more links: (click on recursos linguisticos) - -------------------------------------------------------------------- There a several corpora available at the Department of Romance Languages of the University of Goeteborg (Banco de datos de Prensa Espanola 1977, Banco de Datos de Once Novelas Espanolas 1951-1971, A Concordance based on the Corpus oral the referencia del Espanol contemporaneo.) - --------------------------------------------------------------------- Professor Barry Ife, at School of Humanities, King's College / London is reffered to be compiling a large corpus of modern Spanish. - ------------------------------------------------------------------------ Spanisch newspaper corpus that consists of 200 newspaper texts of latinamerican newspapers on CD-ROM (Tiff and a ASCII Version). The corpus includes 39.081 tokens and is available (to buy) at the Information Science Research Institute / University of Nevada at Las Vegas 4505 Maryland Parkway Las Vegas, Nevada 89154-4201 For information contact ISRI by Phone: +1 702 895 - 3338 Fax: +1 702 895 -1560 E-mail: - ------------------------------------------------------------------- At the University of Murcia there is the CUMBRE Corpus: Contact Prof. Aquilino Sanchez: - ------------------------------------------------------------- The CRATER corpus consists of morphosyntactically tagged communication: - --------------------------------------------------------------- Dr. Purificacion Fdez.- Nistal and the Instituto de Terminologia Bilingue y Traduccion Especializada (ITBYTE) at the Universidad de Valladolid/Spain are in the process of building their own corpus. - --------------------------------------------------------------- Ing. Leonel Ruiz Miyares (Director of Applied Linguistics Centre / Santiago de Cuba) keeps a Spanish-corpus of children's vocabulary (by the way, there is a European Spanish Corpus of child language, the MARIA-Corpus: - ------------------------------------------------------------- The Lingua project (EU-funded project on multilingual concordancing: but as far they have only English, French, German, Italian, Greek, Danish texts - they are considering bringing in Spanish and portoghese: - ------------------------------------------------------

    Thanks a lot Eva Remberger

