LINGUIST List 5.1139

Tue 18 Oct 1994

Disc: Comparative method

Editor for this issue: <>


  1. Jacques Guy, Re: 5.1134 Comparative method
  2. ALICE FABER, Comparative Method
  3. benji wald, Re: 5.1128 Comparative method

Message 1: Re: 5.1134 Comparative method

Date: Tue, 18 Oct 1994 15:09:54 Re: 5.1134 Comparative method
From: Jacques Guy <>
Subject: Re: 5.1134 Comparative method

> Chance similarities are always a possibility, but to the best of my
> knowledge I have never seen a proof to a widely circulating idea that one
> can take any two languages at random and prove that they are genetically
> related on the basis of chance similarities between them. It will be
> absolutely impossible to find any REGULAR phonetic correspondences bet-
> ween any look-alikes due to chance. Besides, how many chance look-alikes
> it is possible to find among two unrelated languages? Very few indeed.

Only if you do not allow for semantic shift. Download file
which is in directory pc/linguistics of the anonymous ftp site, unzip it, read the doc file, and run the language
simulation program chance.exe with the parameters you please, and see.
And when the next issue of Anthropos comes out, in March next year,
read the explanation, entitled "The incidence of chance resemblances on
language comparison".

> From:
> But we would need to be told what rate of loss Jacques
> is assuming and what kind of branching (this is absolutely crucial).

The figure of 40,000 years is assuming a retention rate of 98% per
1000 years. Thus 40,000 years later, 80,000 years worth of
evolution separate any two maximally distant languages: 0.98 to
the 80th power is 0.1986, i.e. 19.86% vocabulary still in common.
The branching assumed is that of an n-wise maximal tree.

> Also, it is vital to know what vocabulary we are talking about. Does
> anybody know of any counterexamples to the claim that the old
> Swadesh list has a rate of loss of no more than 14% per millenium?

Of course. Once again Bergsland and Vogt 1962 article in Current
Anthropology. Then Blust's 1981 paper at the Third International
Conference on Austronesian Linguistics.

> My colleagues and I have been doing calculations using that rate
> as well as the 27% rate claimed by M. L. Bender for a different
> 100-word list. And the results are that for any reasonable-szie
> family (i.e., not Basque or Sumerian or some other complete
> isolate), we should expect to be able to recover enough for
> comparative work for much longer than 10,000 years, but it all
> depends on how many languages and, even more, how they branch.

Let me see. 27% replacement is 73% retention. So, two maximally
distant languages will have, 0.73^2*10 = 0.0018, i.e. 0.18%
left in common, and on a 100-item wordlist that is ONE cognate
at best, more probably zero.

> I cannot see how Jacques arrived at 100 years or whatever it was

Elementary, my dear Alexis. Let me grab my pipe first as an aid
to thinking. Now, (puff puff), I had mentioned Muyuw which had
this distinction, worthy of the Guinness Book of records, of
having innovated 20% of its everyday vocabulary in one single
generation (I had written 30%, but I seem to remember it was
more like 20%. It's 25 years ago, you know. You'd had to
ask David Lithgow, who was with SIL at the time, for the
correct figure). Let us say 20%. That is 80% retention, but,
this time, per generation. There are 3 or 4 generations per
100 years. So, if two languages have been so eccentric as
to evolve at the rate of Muyuw after they split, they will
have undergone 6 to 8 generations' worth of lexical evolution one
century later: from 0.8^6 = 0.2621, i.e. 26%, to 0.8^8 =
0.1678, i.e. 17%. And two centuries after their split:
from 0.8^12 = 0.0687, i.e. 7% to 0.8^16 = 0.0281, i.e. 3%
vocabulary left in common, at which stage it cannot be
distinguished from chance resemblances. I had mentioned
30% replacement per generation instead of 20% so:

One century after the split: from 0.7^6=12.8% to 0.7^8=5.8%
Two centuries: from 0.7^12=1.4% to 0.7^16=0.33%

Good Lord, Holmes!
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Comparative Method

Date: Tue, 18 Oct 1994 20:43:20 Comparative Method
Subject: Comparative Method

With apologies for the untimeliness of some of these comments (I received a
large accumulated backlog of postings in one, intense batch), some factors
that need to be taken into account in evaluating/using the comparative method.

 The first of these is semantic shift. It's been alluded to in the
discussion. Without a full model of semantic relatedness and possible shifts,
it's probably impossible to contemplate automatic, algorithmic application of
CM, as advocated by Stephen Spackman. Aside from "standard" examples, like
English HOUND being cognate with German HUND 'dog', I have culled the
following from my (alas STILL) unpublished collection of c. 300 Proto-Semitic
and West Semitic roots containing sibilants; since the point here involves
semantics, I give only glosses. Anyone who wants specific citations,
attestations, etc. need only holler electronically... (1) in four different
branches of Semitic, we have cognates meaning 'cut/piece of roast meat',
'decorate/engrave', 'pierce an abscess', 'adorn, tatoo'. (2) a root with
Semitic cognates meaning 'sheep', 'head of small cattle', 'sheep',
'sheep/goat' and outside of Semitic means 'pig' (Egyptian) or 'cow' (Egyptian,
Proto-Cushitic, Proto-Chadic). (3) a form meaning 'shoe' or 'sandal'
throughout Semitic, except Ugaritic, where it means 'hemline' is apparently
cognate with reconstructed E Cushitic 'footprint'. (4) a root that throughout
Semitic means 'cucumber' means 'yoghurt' in Jibbali (Modern South Arabian).
The greater the time depth, the greater the likelihood of semantic shift, and
the greater the shift. These changes provide at least a partial answer to the
question posed by Mike Maxwell about vocabulary loss.

 Again with regard to vocabulary loss, Alexis Manaster Ramer observes
that most of the Proto vocabulary will surely be preserved in at least two
descendants. First of all, I'm not sure that this is true at the time depths
that we're discussing. However, for purposes of discussion, I'll grant the
possibility. Even within a well established language family like Semitic,
attested (and attestable!) vocabulary sizes vary enormousy due to accidents of
preservation (for ancient languages) and to more recent language attrition and
death, due to population movement and other less benign factors (for modern
languages). Thus, the Arabic vocabulary available for comparison is much
larger than the Epigraphic South Arabian or the Old Aramaic vocabularies.
Clearly some of the lexical items unique to Arabic are inheritances, from
Central Semitic, from West Semitic, from Proto-Semitic, or from
Proto-Afroasiatic. Some of it was borrowed, from any of a large number of
languages during the period of Islamic expansion. And some of it just
happened. There's no guaranteed way of telling which is the case for a
particular word. As a result, I find it very disturbing to see isolated Arabic
forms that I know not to have Semitic cognates used in attempts to demonstrate
an Afroasiatic affiliation with other language families. Given Proto-Whatever,
with well worked out, recurrent correspondences and lawful series of changes
in its descendants, it would be possible to tell whether an isolated Arabic
form "fits" the big picture. But when the goal is to establish Proto-Whatever,
using the isolated Arabic form constitutes begging the question.

 With regard to a narrower issue, raised by R. M. Blench, I think it's
a natural tendency to think 1000 years one way or the other doesn't matter
when we're talking about the distant past, though of course it matters
enormously closer to the present. 10,000 years is simply a nice round number.
It's worth pointing out also that Omotic is a relatively new grouping,
probably c. 20-25 years old, at most. In Greenberg's original classification
of African languages, it was South Cushitic. Once Omotic was recognized as a
separate phylum, some people immediately started to doubt its inclusion in
Afroasiatic. That was around when I started being more of a phonetician than a
Semitist. Obviously, if more recent work has increased the time depth of
Omotic, those of us who believe it is still Afroasiatic will have to revise
our chronologies accordingly.

This is probably enough to fill up people's mailboxes for now. It's nice to
have a Linguist discussion that takes me away from what I really should be

Alice Faber
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Re: 5.1128 Comparative method

Date: Tue, 18 Oct 94 17:47 PDT
From: benji wald <IBENAWJMVS.OAC.UCLA.EDU>
Subject: Re: 5.1128 Comparative method

Despite the fun everybody is having with this topic, a certain narrow-minded-
ness I perceive in the discussion has been bothering me. It may be my
misunderstanding, but it seems underlying the debate remains the image of
the genetic tree. Lurking in the background is the notion that all the
world's languages hang from the tree and thus go back to a single root. I
realise that is not immediately relevant to the issue of a cut-off point for
possible reconstruction, but we know there are some linguists who are
motivated by this idea(l) -- an idea which has a very strong claim about
language and its users, and might even trivialise attempts to connect
linguistic universals (whatever they are) with innate properties of the human
mind (however that's interpreted). Some of the issues which I haven't seen
raised in the context of the discussion so far are like the following:

Does the discussion have a tacit assumption that the most common form of
linguistic evolution (in the world, not in linguistic textbooks) is tree-
like internal split? Is that assumption justified? If so, why is it that
for languages we know (for sure) are genetically related, we can't figure
out their intermediate branchings, like is Germanic closer to Latin or
Greek etc etc etc etc. -- not to mention what kind of British English
American English split off from (I of course don't accept this formulation
of the issue at all, but I think y'all understand what I mean).

Next there's the apparent paradox between the uniformarian hypothesis of
language change (in reference to the processes of change) and the FACT that
the historical record shows a steady decline of the NUMBER of unrelated
(as far as we know) language families existing at the same time in the world
(bye bye Sumerian, Etruscan etc).
It's pretty obvious why that logically has to be the case, but it does
change the nature of the world as far as language diversity (or does it?--
sure it does, as far as traditional genetic concepts of language
diversity go). And then, we see the same "impoverishment" with
regard to branches of known language families, what's left of
Italic beside Latin, etc etc. But here do we get new branches to
preserve the entropy? No! Because we don't quite know what
they're branches of -- recall the problem of intermediate
branching (consequence of the tree).

My point is this. It's not surprising that we should have problems
understanding whether or not there is a ceiling (or is it a floor) to
the ability of the comparative method to establish genetic relationships
between languages families, because of its apparent assumptions about
the nature of linguistic diversification. It is not even adequate for
 intermediate classification when we CAN demonstrate genetic relationship.

So the rest of my point is that the comparative method, with its
ideological baggage in the form of trees and splits, is only PART of
an adequate theory of linguistic evolution and diversification. It
is indisputably an essential and intellectually admirable part, but I
repeat the question: how common is the form of linguistic evolution
and diversification that allows the comparative method to work as it
was intended to work, as opposed to other forms of linguistic evolution
and diversification. The answer obviously has consequences for how
to approach the relationships among distinct language families. Benji
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue