Editor for this issue: <>
Lloyd Anderson's latest posting makes it seem as though there were some conceptual or terminological difficulty surrounding the question of binary vs. n-ary comparison (where n is greater than 2). However, the mathematics involved is well-understood, and we can easily calculate, for any given set of circusmat nces, whether one or the other method is more likely to yield false positives or false negatives. And it is a simple fact that under some highly artificial conditions, binary comparison can be better than n-ary (for smallish values of n), although as n grows, it will always end up being better, where by better I mean less likely to yield false positives (that is, spurious claims of relatedness) or false negatives (that is, failures to detect genuine relationships). As usual, if we get away from political statements about "comparative method" vs. "long-range comparison" and stick to the specific linguistic and mathematical issues, the answers are unambiguous and not all that hard to find. Alexis MRMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
... or: the reduced mutation algorithm and other things. Lloyd Anderson asks in which respects the "reduced mutation algorithm" fell down. In two respects. 1. By the proof of the pudding. Hartigan had given, along with the description of this algorithm, a wordlist in I forgot how many languages, supplied by Dyen. One could surmise, then, that it had some seal of approval for this type of data. I applied it on language families computer-generated under the strict condition of a constant universal rate of lexical change. Here is my report of the eating of the pudding: The program was fed the wordlists of the simulated language family, and a phylogenetic tree ([26]) drawn from the account of the successive mergings of lists and of the predicted past individual word replacements. The tree thus reconstructed is strikingly similar to tree [12b], obtained by traditional lexicostatistical techniques using the mean-percentage method and a zero tolerance. As implemented, the reduced mutation algorithm was extremely slow, requiring about 120 seconds of CPU time on a DEC-KL10, whereas none of the other methods described so far had taken more than 0.5 seconds to process the percentage table, which had been produced from the wordlists in just 0.4 seconds. (Experimental glottochronology: basic methods and results. Pacific Linguistics, Canberra, 1980. p.19) Performance [based on eight experimental simulations] The reduced mutation algorithm identified the basic binary split in all experiments, but did not succeed, even once, in reconstructing the subsequent ternary split of ECHO-SIERRA, either as such, or as two successive binary splits. Discussion The reasons for the resounding failure of the reduced mutation algorithm are somewhat akin to those for the failure of the traditional lexicostatical method: the measure of the similarity or of the distance between two languages is based on data from just two wordlists. The measure of distance used by the reduced mutation algorithm is furthermore not reconciliable, at least in my eyes, with the linguistic model. Interested readers should refer to Hartigan 1975:233-246. (Ditto, p.33) The book in question is: Hartigan, John A. Clustering Algorithms. Wiley, New York, 1974. 2. On methodological grounds. As I had already suspected 15 years ago, the metric used does not mirror the quantitative properties of language families, and the clustering algorithm itself compounds errors instead of factoring them out. And now for the umpteenth time around (*sigh*) ... Lloyd Anderson writes: "On the use of a biological "reduced-mutation algorithm" applied to linguistic data... we are ... positing a set of historical chains of development by which a language or languages with GIVEN starting points CAN develop step by step into descendents leaving the evidence along the way and the results today which we have as our evidence" (my emphasis) Precisely. GIVEN starting points. CAN develop. This parallels biology. Biologists are helped by the fossil record, linguists by documentary evidence, dated or datable. But most of the world's languages lack this evidence. And beyond some 5000 years in the past, the evidence is, in all cases and for all practical purposes, zilch. The starting points, then, are GIVEN if and only if the ancestor languages have been preserved. When they have not, we, to paraphrase Lloyd Anderson, "are positing, WITHOUT the benefit of evidence left along the way, a set of historical chains of development by which languages which we have as our evidence today COULD HAVE developed step by step from HYPOTHETICAL starting points". Quoting further: "This "step by step" is like a minimal series of mutations, with the added information that it is our business to learn which changes (mutation steps) are more natural, and OF COURSE MOST of these go only in one direction". (My emphasis again). No. First, it is not true that most changes go only in one direction. Far from it. Most changes can take place in any of two opposite directions. Thus in French /o/ (from /akwa/ "water") we have zero originating from /k/ but in Cypriot /trika/ (from /tria/ "three") and /krika/ (from /krea/ "pieces of meat") we have /k/ originating from zero. And note that here I am using not hypothetical reconstructions but attested ancestors, Latin and Ancient Greek. Second, what is "natural"? Is it more natural to see a case system shrink, like Germanic or Greek? Or expand, like Finnish? (another example of change in opposite directions). Perhaps "natural" applies here to phonetic changes. What is more natural, then? To develop a bloated vowel system, like Norman French (see Martinet's description of his mother's dialect), or to reduce it, like Castilian? (yet another instance of change in opposite directions). Once upon a time Foley voiced a theory of phonetic stability whereby labials, being more front than dentals, were more resistant to weakening or loss, dentals themselves being more stable than velars for the same reason. Sounds reasonable, and natural, doesn't it? So much then for the whole Celtic family! And Japanese, and Bau Fijian... and "naturalness". Lastly, let it be granted that most changes, nay, ALL changes have been observed to occur in ONLY ONE direction. They can have be observed so only from "given starting points" (attested ancestors, e.g. Akkadian, Sanskrit, Latin...) and their "descendents leaving the evidence along the way and the results today which we have as our evidence". For, if they had not, from what could the observation have arisen other than HYPOTHETICAL starting points? (And we would be begging the question again, as usual). Since the languages so attested are a very, very small minority of the languages of the world, these hypothetical observations would not be validly generalizable. It is the old story of the traveller who drops in a pub in Dublin between two ships, and leaves persuaded that all Irishwomen have red hair because the barmaid had red hair. There is nothing new under the sun. Chretien and Kroeber had experimented with a similar metric as that of Hartigan's reduced mutation algorithm in the 1930's, by the way. And Dyen has, in 1992, again resorted to factorial analysis, exactly like Milke uncomprehendingly did in 1970. Indeed, there is nothing new under the sun. Oh well, as my Latin teacher was fond of telling us: There are three secrets to teaching. They are: 1. Repeat. 2. Repeat. 3. Repeat. He was wrong, you know. There are FOUR secrets: 4. Continue from step 1.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue