LINGUIST List 6.409

Wed 22 Mar 1995

Disc: Comparative method

Editor for this issue: <>


  1. Alexis Manaster Ramer, Re: 6.374 Focus Systems, Comparative Syntax
  2. Jacques Guy, Comparative anything (syntax, lexicon, amino-acids...)

Message 1: Re: 6.374 Focus Systems, Comparative Syntax

Date: Thu, 16 Mar 1995 10:37:06 Re: 6.374 Focus Systems, Comparative Syntax
From: Alexis Manaster Ramer <amrCS.Wayne.EDU>
Subject: Re: 6.374 Focus Systems, Comparative Syntax

Lloyd Anderson's latest posting makes it seem as though there were
some conceptual or terminological difficulty surrounding the
question of binary vs. n-ary comparison (where n is greater than 2).
However, the mathematics involved is well-understood, and we
can easily calculate, for any given set of circusmat
nces, whether one or the other method is more likely to yield
false positives or false negatives. And it is a simple fact
that under some highly artificial conditions, binary comparison
can be better than n-ary (for smallish values of n), although
as n grows, it will always end up being better, where by better
I mean less likely to yield false positives (that is, spurious
claims of relatedness) or false negatives (that is, failures to
detect genuine relationships). As usual, if we get away from
political statements about "comparative method" vs. "long-range
comparison" and stick to the specific linguistic and mathematical
issues, the answers are unambiguous and not all that hard to find.

Alexis MR
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Comparative anything (syntax, lexicon, amino-acids...)

Date: Sat, 18 Mar 1995 14:57:40 Comparative anything (syntax, lexicon, amino-acids...)
From: Jacques Guy <j.guytrl.OZ.AU>
Subject: Comparative anything (syntax, lexicon, amino-acids...)

... or: the reduced mutation algorithm and other things.

Lloyd Anderson asks in which respects the "reduced mutation algorithm"
fell down. In two respects.

1. By the proof of the pudding. Hartigan had given, along with the
 description of this algorithm, a wordlist in I forgot how many
 languages, supplied by Dyen. One could surmise, then, that it had
 some seal of approval for this type of data. I applied it on language
 families computer-generated under the strict condition of a constant
 universal rate of lexical change. Here is my report of the eating
 of the pudding:

 The program was fed the wordlists of the simulated language family,
 and a phylogenetic tree ([26]) drawn from the account of the
 successive mergings of lists and of the predicted past individual
 word replacements. The tree thus reconstructed is strikingly similar
 to tree [12b], obtained by traditional lexicostatistical techniques
 using the mean-percentage method and a zero tolerance.
 As implemented, the reduced mutation algorithm was extremely slow,
 requiring about 120 seconds of CPU time on a DEC-KL10, whereas none
 of the other methods described so far had taken more than 0.5 seconds
 to process the percentage table, which had been produced from the
 wordlists in just 0.4 seconds.
 (Experimental glottochronology: basic methods and results.
 Pacific Linguistics, Canberra, 1980. p.19)

 Performance [based on eight experimental simulations]

 The reduced mutation algorithm identified the basic binary split in
 all experiments, but did not succeed, even once, in reconstructing
 the subsequent ternary split of ECHO-SIERRA, either as such, or as
 two successive binary splits.


 The reasons for the resounding failure of the reduced mutation
 algorithm are somewhat akin to those for the failure of the
 traditional lexicostatical method: the measure of the similarity or
 of the distance between two languages is based on data from just two
 wordlists. The measure of distance used by the reduced mutation
 algorithm is furthermore not reconciliable, at least in my eyes, with
 the linguistic model. Interested readers should refer to Hartigan

 (Ditto, p.33)

 The book in question is: Hartigan, John A. Clustering Algorithms.
 Wiley, New York, 1974.

2. On methodological grounds. As I had already suspected 15 years ago,
 the metric used does not mirror the quantitative properties of
 language families, and the clustering algorithm itself compounds
 errors instead of factoring them out.

And now for the umpteenth time around (*sigh*) ...

Lloyd Anderson writes:

"On the use of a biological "reduced-mutation algorithm" applied to
linguistic data... we are ... positing a set of historical chains of
development by which a language or languages with GIVEN starting points
CAN develop step by step into descendents leaving the evidence along the
way and the results today which we have as our evidence" (my emphasis)

Precisely. GIVEN starting points. CAN develop. This parallels biology.
Biologists are helped by the fossil record, linguists by documentary
evidence, dated or datable. But most of the world's languages lack this
evidence. And beyond some 5000 years in the past, the evidence is, in
all cases and for all practical purposes, zilch.
 The starting points, then, are GIVEN if and only if the ancestor
languages have been preserved. When they have not, we, to paraphrase
Lloyd Anderson, "are positing, WITHOUT the benefit of evidence left
along the way, a set of historical chains of development by which
languages which we have as our evidence today COULD HAVE developed step
by step from HYPOTHETICAL starting points".

Quoting further:

"This "step by step" is like a minimal series of mutations, with the
added information that it is our business to learn which changes
(mutation steps) are more natural, and OF COURSE MOST of these go
only in one direction". (My emphasis again).


First, it is not true that most changes go only in one direction. Far
from it. Most changes can take place in any of two opposite directions.
Thus in French /o/ (from /akwa/ "water") we have zero originating from
/k/ but in Cypriot /trika/ (from /tria/ "three") and /krika/ (from
/krea/ "pieces of meat") we have /k/ originating from zero. And note
that here I am using not hypothetical reconstructions but attested
ancestors, Latin and Ancient Greek.

Second, what is "natural"? Is it more natural to see a case system
shrink, like Germanic or Greek? Or expand, like Finnish? (another
example of change in opposite directions). Perhaps "natural" applies
here to phonetic changes. What is more natural, then? To develop a
bloated vowel system, like Norman French (see Martinet's description of
his mother's dialect), or to reduce it, like Castilian? (yet another
instance of change in opposite directions). Once upon a time Foley
voiced a theory of phonetic stability whereby labials, being more
front than dentals, were more resistant to weakening or loss, dentals
themselves being more stable than velars for the same reason. Sounds
reasonable, and natural, doesn't it? So much then for the whole Celtic
family! And Japanese, and Bau Fijian... and "naturalness".

Lastly, let it be granted that most changes, nay, ALL changes have been
observed to occur in ONLY ONE direction. They can have be observed so
only from "given starting points" (attested ancestors, e.g. Akkadian,
Sanskrit, Latin...) and their "descendents leaving the evidence along
the way and the results today which we have as our evidence". For, if they
had not, from what could the observation have arisen other than
HYPOTHETICAL starting points? (And we would be begging the question
again, as usual). Since the languages so attested are a very, very small
minority of the languages of the world, these hypothetical observations
would not be validly generalizable. It is the old story of the traveller
who drops in a pub in Dublin between two ships, and leaves persuaded
that all Irishwomen have red hair because the barmaid had red hair.
There is nothing new under the sun.

Chretien and Kroeber had experimented with a similar metric as that of
Hartigan's reduced mutation algorithm in the 1930's, by the way. And
Dyen has, in 1992, again resorted to factorial analysis, exactly like
Milke uncomprehendingly did in 1970. Indeed, there is nothing new under
the sun.

Oh well, as my Latin teacher was fond of telling us:

There are three secrets to teaching. They are:

1. Repeat.
2. Repeat.
3. Repeat.

He was wrong, you know. There are FOUR secrets:

4. Continue from step 1.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue