LINGUIST List 5.1168

Sun 23 Oct 1994

Disc: Comparative Method

Editor for this issue: <>


  1. , Comp. Method, rates of lex change in E. Greenlandic
  2. "Larry Trask", Random similarities: clarification
  3. Jacques Guy, Re: 5.1159 Comparative Method yet again

Message 1: Comp. Method, rates of lex change in E. Greenlandic

Date: Fri, 21 Oct 1994 12:08:22 Comp. Method, rates of lex change in E. Greenlandic
From: <>
Subject: Comp. Method, rates of lex change in E. Greenlandic

Alexis Manaster Ramer
>The question then is simple: is there
a language for which the rate of loss from the Swadesh list is more
than 14% per millennium?

I don't have the reference on the top of my head, but the counter
example which is to the opposite extreme of Icelandic (for slow
rates of lexical change) is East Greenlandic (Eskimo-Aleut) for
a very high rate of lexical change. Similarly, certain dialects of
Yugcetun (a.k.a. Central Yup'ik Eskimo) have changed, it appears,
at much faster rates than have other dialects. In both cases it
seems that restrictions on the use of morphemes from the names of
the deceased have led to circumlocutions in basic vocabulary.
My own stance on the issue: I think that lexico-statistics is very
valuable for illustrating degrees of similarity in lexicons, but
I am *extremely* skeptical of using lexico-statistical data to
infer glottochronological dates without other (i.e, archaeological or
historical) corroboration. --roy--
=+=- Roy Iutzi-Mitchell -=+=
+=-=+ P.O. Box 1128 I.A.Y.I.A. +=-=+
=+=- Bethel, Alaska 99559 U.S.A. 907-543-3642 -=+=
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Random similarities: clarification

Date: Sat, 22 Oct 1994 13:32:30 Random similarities: clarification
From: "Larry Trask" <>
Subject: Random similarities: clarification

Perhaps I wasn't as clear as I might have been in my original query
about the chance of finding random similarities between arbitrary
languages. Briefly, the "methodology" that I had in mind was this:

(1) You compare any languages or families that take your fancy, and you
 don't need to have any specialist knowledge of them.
(2) You extract words or morphemes from any existing sources whatever,
 regardless of reliability, but mostly from bilingual dictionaries,
 when these exist.
(3) You take any items you find, without knowing or caring anything
 about their histories, even when such information is available.
(4) You accept everything that comes along, and you certainly don't
 confine yourself to the Swadesh word list or to anybody's idea od
 "basic" vocabulary. You are delighted to accept obscure dialect
 words for varieties of mushrooms, and you draw the line only at
 words like `telephone'.
(5) When dealing with a family, you are free to take items found only in
 a single language, and not in the family as a whole.
(6) Having already decided that you are probably looking at a genetic
 relationship, you collect only apparent confirming instances, and
 attempt no serious scrutiny of your results.
(7) You accept as "cognates" any items that strike you as exhibiting a
 significant degree of resemblance in form and meaning, and you
 certainly don't expect to find systematic correspondences in form --
 in fact, trying to find them would defeat the object of the
(8) In identifying your "cognates", you feel free to segment into
 oblivion any inconvenient parts of the items you are comparing, so
 that you can obtain a match for what's left.
(9) You are at liberty to invoke a few metatheses whenever you find this
 convenient for your purposes.

Doubtless this sort of work will be all too familiar to most readers.
But I have a particular reason for raising the issue here. I'm a
specialist in Basque, and Basque, as the only genetic isolate in Europe,
has attracted an enormous amount of attention from people eager to find
some relatives for it somewhere. It's hardly possible to name an Old
World language or family that hasn't been claimed as a relative of
Basque. And, of course, with only a couple of very minor exceptions,
and one major one, ALL of this work has been carried out by precisely
the methodology I've just described.

This would be of little interest to most people, were it not for the
fact that some of this work has acquired a certain veneer of
respectability. Just to cite the best-known case, the ceaseless claims
that Basque is related to some or all of the Caucasian languages have
apparently persuaded a number of respectable linguists that these are
plausible claims backed up by something in the way of evidence. Not so:
they are backed up by large doses of the stuff I have just described,
and by nothing else whatever.

My list of 65 Basque-Hungarian "cognates" was intended as a kind of
reductio ad absurdum of all this stuff. But one of the people who
replied to me has warned me that it is by no means unlikely that the
numerous proponents of this kind of work will take my results seriously,
and proceed to reorganize their view of Proto-Most-Known-Languages
accordingly. Since I have no desire to be cited as providing evidence
for Proto-MKL, I decided I had better find out what kind of results
other people were getting in similar endeavors.

Briefly, almost everyone I have heard from has reported results
comparable to mine, and one or two people have collected lists of
"cognates" from improbable places which are so impressive as to put my
65 matches to shame. I was particularly fascinated by Don Ringe's
recent posting describing his experiment with seven North American
languages; it looks to me as if his results deal a heavy blow to the
frequent assertions that applying such methodology to large numbers of
languages all but eliminates the problem of chance resemblances.
However, Alexander Vovin's negative results seem to show that I was a
little over-enthusiastic in believing that impressive lists of phony
cognates can always be collected: sometimes, it appears, phony cognates
just refuse to turn up.

My thanks to Steve Seegmiller, Ann Dizdar, Jacques Guy, And Rosta, Tim
Pulju, Brad Coon, Jakob Dempsey, Alice Faber, Alexander Vovin, Don
Ringe, Paul Sidwell, and Alexis Manaster Ramer for replying in one way
or another.

Larry Trask
University of Sussex
Brighton BN1 9QH
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Re: 5.1159 Comparative Method yet again

Date: Sat, 22 Oct 1994 08:50:33 Re: 5.1159 Comparative Method yet again
From: Jacques Guy <>
Subject: Re: 5.1159 Comparative Method yet again

Alexis Manaster Ramer:
> Very funny, Jacques! Thanks for injecting the first bit of INTENTIONAL
> humor into the discussion.
It has to be intentional. When I witness the anti-scientific
manner in which this question is tackled I can only turn into
Monty Python (Romanes eunt domus!). When nowadays you see a job
offered in NLU or machine translation, what do they ask for?
Mathematicians or computer scientists.
> (1) The famous Bergsland reference claimed that some languages show
> a LOWER rate of loss for the Swadesh list (which is a certain list
> of 100 meanings which is claimed to exhibit a 14% per millenium
> rate of loss) in some languages (notably, Icelandic), but not, as
> far as I can recall, any examples with a HIGHER rate.
> So we really must stick to
> the Swadesh list, the only one which has been tested on lost of
> (that is, lots of) languages. The question then is simple: is there
> a language for which the rate of loss from the Swadesh list is more
> than 14% per millennium?

Firstly, even though I will sound like my famous fellow countryman,
Monsieur de La Palisse, allow me to point out that a rate of
loss of 14% per millennium is the same as a rate of retention
of 86% (100-14=86, do we all agree?). Now I do not know where
that figure of 86% comes from, for I remember that Swadesh used
81% in his "Salish Relationships". Perhaps 86% is the 100-item
list, 81% the 200-item one. Never mind, as we shall presently see.

Does anyone out there know about Lees's 1953 "The basis of glotto-
chronology"? In Language 29/2:113-127. No? Well read it first.
Lees used Swadesh's list on 13 languages, comparing them with
their historically attested ancestors. Here are the figures
which he reported:

Old English vs Modern English: 76.6% per millennium
Plautine Latin vs 1600 Spanish: 79.0%
Plautine Latin vs Moliere's French: 77.6%
Old High German vs Modern German: 85.4%
Egyptian vs Coptic: 76.0%
Koine Greek vs Modern Athenian: 83.6%
Koine Greek vs Modern Cypriot: 82.9%
Classical Chinese vs Modern Mandarin: 79.5%
Old Norse vs Modern Swedish: 85.4%
Classical Latin vs Modern Tuscan: 83.9%
 Modern Portuguese: 80.6%
 Modern Rumanian: 76.4%
 Modern Catalan: 79.3%

May I trust you to count those under 86% retention?
Under 81%?

As I had shown in a paper presented at the XVth Pacific
Science Conference in 1983, which has proved impossible
to get published, a proper statistical analysis of
Lees's data shows that his sample only goes to prove
that 30.34% of all languages have retentions of
82.84% and above, 30.34% of 78.72% and below, so that
less than 40% fall in the range 78.2% to 82.24%.
But since the majority of retention rates calculated
by Lees are averaged over two millennia and his
sample is grossly biased towards Romance to the
point of having the same language (Northern Iberian)
represented twice for a full millennium, the variance
of the retention rates, if properly calculated,
would be even wider.

As for Bergsland and Vogt, they found an upper
estimate of 58% retention per millennium for
Eastern Greenlandic -- 42% LOSS, on the 100-item
Swadesh list. I repeat: 42% loss, not 14%.

In that same unpublished, unpublishable paper, I
show that data presented by Dyen at the Third
International Conference on Austronesiam Linguistics
in 1981 is evidence that Sasak had been 1.2 times
as retentive as Balinese since their putative
split, and that data presented by Blust at the
same conference corroborated those figures.

This discussion about how far back the comparative
method can take you is nonsense through and through.
Short of datable documentary evidence -- such
lapidary inscriptions, clay tablets, etc. there
is no way to date putative ancestors, no way at

This nonsense has been peddled time and again, and
I am afraid will keep being peddled. To those
numerate it only makes linguists look like the
lunatic fringe. You should see the looks I got
at seminar on natural language understanding
at Monash University a few years ago when I
mentioned that my PhD was in linguistics.
"Ah, one of those clowns!".

Then, Alexander Vovin:

> > I had mentioned Muyuw which had this distinction, worthy of the
> > Guinnes Book of records, of having innovated 20% of its everyday
> > vocabulary in one single generation...
> Well, what might be true of EVERYDAY vocabulary, may not necessarily
> be true of BASIC vocabulary, as it is only tiny portion of basic vocabulary
> which is used in everyday vocabulary.

I have discussed "basic vocabulary" at length in that paper. What has been
meant by authors (from Swadesh to Hymes via Dyen and others), is vocabulary
resistant to change. But they hardly ever define it so, and when they do,
they shift the meaning from "basic" to "universal" in the typical logical
fallacy of equivocation. And then invoke the need to find *the* vocabulary
which is equally stable for all languages. As soon as we find it, all will
be well. This is another fallacy, called begging the question.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue