LINGUIST List 7.960

Mon Jul 1 1996

Sum: Corpus analysis resources for Spanish

Editor for this issue: Ann Dizdar <>


  1. "J.L. Sancho, INSTITUTO DE LEXICOGRAFIA", Corpus analysis resources for Spanish

Message 1: Corpus analysis resources for Spanish

Date: Mon, 01 Jul 1996 14:00:06 +0200
Subject: Corpus analysis resources for Spanish
Dear all:

	A while back my colleague Maria Paula Santalla and I (Jose
Luis Sancho) posted an enquiry about corpus analysis resources for
Spanish. The following is a summary of what we have been referred
to. We would like to thank for their kind responses (order
irrelevant): Max Louwerse, Mike Scott, Carlos Subirats, Ken Litkowski,
Jean V'eronis, Yorick Wilks, Sandro Pedrazzini, John Aberdeen, Ana
Mart'inez, Nuno Miguel Cavalheiro Marques and Ken Beesley. This list
exhausts our 'inbox'; therefore, we beg anyone else who responded and
is not mentioned above to forgive us (or our server); In that case,
retry, please. Note that the enquiry was posted in various lists,
hence information not necessarily coming from this list may be quoted
bellow. We apologize for any multiplicities.

##Max Louwerse (<>) told us about the
Qualrs-lst on which a lot of tag-software has been discussed. As for
software, he mentioned NUDIST (Sage Publishers) and Notabene, whose
homepage is and";

You can also email to Giovanni Flammia (

##Mike Scott (<>) suggested

This accesses WordSmith Tools (Oxford Univ. Press 1996).

##Carlos Subirats (<>) pointed to a 'Etiquetador y
desambiguizador del espanol', developed by the Laboratorio de
Linguistica Informatica de la Universidad Autonoma de Barcelona. The
address provided is

	Carlos Subirats Ruggeberg
	Universidad Autonoma de Barcelona
	Laboratorio de Linguistica Informatica
	Edificio B
	08193 Bellaterra, Spain

	Fax: (343)-581-16-86
 Tel: (343)-581-22-29

##Ken Litkowski <71520.307CompuServe.COM> directed us to some
dictionary utilities for creating and maintaining lexica. A
description of this software is available at

##Jean V'eronis (<>) suggested a look at

and contacting Nuria Bel (

##Yorick Wilks (<>) pointed to

##Sandro Pedraziini (<>) pointed to a system with wich
you can not only create and maintain lexica, but you can use it to
generate different forms of taggers, lemmatizers. A description of it
can be found at

##John Aberdeen (<>) mentioned a fast part of speech
tagger, based on Eric Brill's notion of tranformation based error
driven learning.

##Ana Mart'inez (<>) mentioned MABLe, a
'multilingual letter authoring tool'.

##Nuno Miguel Cavalheiro Marques (<>) brought to our
attention two POS taggers, one using Viterbi tagging and HMM and the
other using Neural Networks. You can find a short review of this work

There you can also access an article about POLARIS:a morphological
lexical acquisition and retrieval data base system. Contact with
Gabriel Lopes ( was also suggested.

##Ken Beesley (<>) noted that the
Rank Xerox Research Centre in Grenoble France has developed systems
for tokenization (word/term division) morphological analysis (for
syntax, or, less detailed, for tagging) part-of-speech "guesser" (for
words not found by the morphological analysis) tagging (based on an
HMM tagger, trained on a corpus) for Spanish. You can experiment with
the morphological analysis and tagger on

Thank you very much again. See you on the net

Jose Luis Sancho Maria Paula Santalla
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue