LINGUIST List 8.1133

Mon Aug 4 1997

FYI: AltaVista Synomyn Search

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. Jacques Guy, Computational linguistics: The AltaVista "refine" option

Message 1: Computational linguistics: The AltaVista "refine" option

Date: Mon, 4 Aug 97 12:35:55 EST
From: Jacques Guy <j.guytrl.telstra.com.au>
Subject: Computational linguistics: The AltaVista "refine" option

A colleague of mine told me about the "refine" option offered by the
AltaVista search engine (http://www.altavista.digital.com/) and how
good it was. In a nutshell, the "refine" options returns a list of
synonyms of, and notions related to the words in your query. Indeed,
it gave extremely sensible responses.

Perversely perhaps, I tried it in French, using "vin" (what else!) as
the keyword. Bingo. This is what it returned:

72% Etait, etre, annees, meme, apres, etaient, derniers
59% Egalement, particulierement, differentes, possibilite
52% Qualite, vins, vin, vignoble, vigne, crus, vignes, vignerons,
vigneron [etc...]

Far from satisfactory. "Eau" and "pain" returned similar nonsense,
featuring "etaient", "etre", "egalement" et alia in prominent
positions. In fact, AltaVista "refine" seems decidedly adverse to
foodstuff in French, "fruit", "poisson" and "sandwich" failing equally
miserably (so did "sable", "mer", "lac").

So I was quite surprised when Italian queries about "wine" returned
sensible synonyms:

60% vino, vini, vigneti, uve
40% quantita, ettari, vitigni
39% sapore, profumo, invecchiamento
[etc.]

"Acqua" and "pane" fared equally well. So I turned my attention to
Spanish. Spanish did quite as badly as French. This is quite puzzling
for the size of the Spanish data is quite large.

I don't know what inspired me, I decided to ask for an Italian
sandwich ("panino"). Bingo again!

60% perche, chissa, guardo, cazzo, sembrava, poiche, merda, riposto
 [yes, unbelievable but true]
54% mangiare, specialta, birra, mangia, roba, piatti, gusti, soldi,
bere
33% avevo, scusa, aveva, stavo, rispose, facevano
[etc.]

My colleague and I scratched our collective heads, experimented some
more, and came to the conclusion that the thesauri are built by a
neural net (she is heavily into neural nets). Still, the excellent
behaviour of the English thesaurus was suspect. But no, experimenting
demonstrated that it could not have been a hand-crafted thesaurus.
There are ways of "salting" a neural net and that is probably what
Digital did for English (and perhaps for Italian).

Do take a break and experiment a bit with AltaVista "refine" option in
your favourite languages (Polish was as nonsensical as French). It is
quite amusing. And perhaps useful: next time someone knocks at your
door with a neural net for sale... (I have seen queries for "Kentucky
fried chicken" return "chicken sexers", "waste burners" and "singing
teachers", courtesy of a neural net).

j.guytrl.telsta.com.au
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue