Editor for this issue: <>
If memory serves, isn't the "bow-vow" theory of language origin mentioned in Bloomfield's Language, with civilized contempt? I can't check it myself since I gave most of my books to the department when I retired (not that all of them are read by anybody--ah, a Lakoffian passive PROOF!). Anyway, Bloomfield, I think, also mentions a Dutchman who traced all languages to Dutch. And there is, of course, our good old friend Prof. Marr (posthumously deposed by Stalin, the greatest linguist of them all) who even knew the four syllables of the original proto-language. So, what else is new? Greetings, Henry KuceraMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
This is a long (350 lines) and perhaps not too easy article. I nevertheless urge you to read it carefully. It contains evidence that Cavalli-Sforza chose to ignore data well known among geneticists because they contradicted his assumptions, and that he chose to ignore mathematically correct methods of tree reconstructions the existence of which he knew. That is what I meant in my very first posting when I wrote that I proposed to show that his article was in breach of the scientific method and his methods mathematically incorrect. GENES, PEOPLES AND LANGUAGES? (Scientific America, November 1991, pp.72-78) Cavalli-Sforza's hypotheses examined (Part II) The reconstruction of genetic trees, like the one in the article in question, and of the divergence of language families are but particular applications of a much more general problem: Given a tree-shaped transmission network, a message is input at one node, from which it travels to the terminal nodes. The network is noisy, that is, transmission is subject to errors. The problem is: from the garbled versions of the original message collected at the terminal nodes, reconstruct the network and the amount of errors on each of its arcs (branches, if you prefer). Replace "error" by "mutation", and "message" by "DNA genes" or "mitochondrial genes" and you have a genetic model. Replace "error" by "innovation" or "drift", and "message" by "basic sample vocabulary" or whatever other data you see as representative of the languages in the family being reconstructed, and you have a linguistic model. You will have noticed that no mention is made in there of the rate of change, be it genetic mutations or linguistic innovations, contrary to Sforza and Wilson, who both posit a mutation rate constant in time and across the different human populations. Likewise, Swadesh posited a universal rate of vocabulary replacement also constant through time and across languages when he proposed his theory of glottochronology, later re-christened "lexicostatistics". Readers familiar with these two methods will have been struck by the remarkable similarity of Sforza's and Wilson's methods with glottochronology and lexicostatistics, even down the terminology: "drift", "constant rates", etc. Yet, the assumption of a constant universal rate is completely unnecessary for reconstructing trees, genetic or linguistic. Here is a tree reconstructed under the assumption that the innovation rate of linguistic features (be they lexical, grammatical, or whatever) varies in time and across languages (I have deleted the language names, which are irrelevant to the point, as you will later see): _95.-90----- (1) _88.-62| `-94----- (2) | | `-64--------- (3) | `-61------------- (4) | .-42------------- (5) | | _96.-78----- (6) | |-75| `-76----- (7) | | `-71--------- (8) | | _93.-87- (9) |-84| .-93| `-87- (10) | | | `-64----- (11) | `-87|-75--------- (12) | | _99.-86- (13) | `-82| `-78- (15) _| `-87----- (14) |_94.-37------------- (16) | `-54------------- (17) | .-78--------- (18) | .-91|_89.-93----- (19) |-64| `-86----- (20) | |_86.-75--------- (21) | `-80--------- (22) | _79.-94--------- (23) | | `-80--------- (24) | | .-80--------- (25) |_70| | .-94----- (26) |_82|-85|_99.-96- (27) | `-97- (28) |-74--------- (29) `-66--------- (30) The figures along each branch (arc, if you prefer) represent the percentage of the message which has been correctly transmitted. And here is the true tree: _87.-90----- (1) _87.-70| `-95----- (2) | | `-70--------- (3) | `-60------------- (4) | .-43------------- (5) | | _96.-78----- (6) | |-77| `-77----- (7) | | `-70--------- (8) | | _92.-88- (9) |-80| .-92| `-87- (10) | | | `-65----- (11) | `-84|-75--------- (12) | | .-85----- (13) | `-79|-89----- (14) _| `-78----- (15) |-34----------------- (16) |-51----------------- (17) | .-79--------- (18) | .-91|_88.-91----- (19) |-64| `-88----- (20) | |_86.-79--------- (21) | `-78--------- (22) | _79.-93--------- (23) | | `-80--------- (24) | | .-83--------- (25) |_71| | .-95----- (26) |_82|-85|_98.-95- (27) | `-98- (28) |_94.-76----- (29) `-71----- (30) This last tree is artificial. It is part of the output from a computer simulation. Another part is the log of the history of the tree which, for each split, gives its date, the number of items innovated since the previous split, and lists them. The last part, computed from the latter, is, to use a lexicostatistical term, a matrix of "percentages of shared cognates" (it could equally well be "shared genes"), from which the reconstructed tree was computed. In possession of such data, it is interesting to test various reconstruction methods (clustering algorithms, if you prefer) by observing how well their resulting trees fit the true tree. Classical lexicostatistical methods all yield poor results. One of the worst amongst the many clustering algorithms I once tested is the minimal-spanning tree method, which, incidentally, is precisely what Cavalli-Sforza used for the genetic tree p.76 of the article in question: "Essentially, this concept describes the tree having the smallest total branch length" (p.73, col.2) The minimal-spanning tree method, applied blindly, also tends to reconstruct trees with only binary splits. Observe the genetic tree, p.76 of the Scientific American article: all binary splits, except for the one and only four-way split of the Southern Chinese, Indonesian, Malaysian, and Philippino populations. You have there a typical artifact of a clustering algorithm about 30 years old, as the author himself admits: "An example is furnished by a tree linking 15 populations that Anthony F.W. Edwards, now at Cambridge, and I published 27 years ago" (p.73, col. 2, immediately above the former quotation) Observe now the reconstructed tree above, and you will see, instead, two, three, four and five-way splits [Footnote 1]. Sforza posits this model of genetic evolution, which is exactly similar to that of glottochronology: "The evolutionary model we used is the simplest. It predicts that the branches will evolve equally fast". (p.73, col. 1) "The mitochondrial clock is based on the number of mutations that have accumulated.... Whereas we hypothesized that our gene frequencies had drifted at constant rates, the Wilson group hypothesized that their mitochondrial genes had mutated at constant rates." (p.74, col.2) Yet, Sforza is aware of the weaknesses of his model: "If one assumes that the rate of evolutionary change is constant along all branches, one can equate their lengths to the time elapsed since they diverged. Such rooted trees may also be subject to biases, however, if some branches have undergone more rapid evolutionary change than others". (p.74, bottom of col.2) He is also aware that there exist methods unaffected by unequal and varying evolutionary rates: "Mathematical techniques of population genetics can minimize biases by accurately predicting rates of evolution". (It is precisely such techniques which I used in the reconstructed tree at the beginning of this article. The word "predicting" in the quotation is a misnomer. The correct word is "estimating"). Why, then, does he not use those robust methods? He does not say [Footnote 2]. Extraordinary indeed, for the "gene map" on p.74 of the article in question shows a glaring piece of evidence that evolutionary rates fluctuate wildly. Consider Iceland on that map. It has been colored dark green, showing that from 0% to 1% of its population is Rh-negative. The population of Iceland is about two-third of Scandinavian and one-third of Irish descent. On that same map, Scandinavia Ireland, and the British Isles show from 16% to 25% and above Rh-negative. The other populations with a proportion of Rh-negative individuals similar to Iceland occupy the eastern half of Asia, Madagascar, Australia and New-Zealand. I may perhaps be forgiven for having believed, upon seeing that map, that Iceland had been mistaken for Eskimo- populated Greenland. Not at all. I went to the considerable trouble of verifying the Icelandic data. Considerable, because Sforza does not give his sources. And it appeared that the aberrant case of Iceland is not only well-known among geneticists, but even more aberrant that the "gene map" shows. "Finally, tests were done on some 2000 Icelanders, mostly of precisely known birth-places within Iceland, for some twenty [blood-classification] systems. The results of the tests were then compared with the results of similar tests on the populations of the separate countries of the British Isles and Scandinavia, and of several European countries. A large quantity of data was fed into a computer, using a highly sophisticated programme, and it was anticipated that the result would be a clear-cut indication of either a Scandinavian or British origin, or perhaps a precise estimate of the proportion of genes derived from each of the two sources. Neither of these was found to be the case. The Icelanders showed a very marked difference from the populations of all other European countries, British, Scandinavian, and other, and even wide differences between the regions within Iceland itself." (Mourant, 1983:79) Before quoting further, I ask you to stop and ponder the import of that last sentence: "The Icelanders showed a very marked difference from the populations of all other European countries, British, Scandinavian, and other, and even wide differences between the regions within Iceland itself." First, it is prime evidence of an exceedingly fast rate of genetice drift. Second, it is prime evidence that the "gene map" showing Iceland uniformly dark green, at 0% to 1% Rh-negative, is the artifact of having averaged the Rh-negative scores of extremely divergent local populations. Now to quote Mourant further: "Since there is no doubt that the original colonists of Iceland came almost exclusively from Scandinavia and the British Isles, there must have been great changes in the island gene frequencies since the colonization." Indeed, and what might the reasons be? "Natural selection may have played a part, but there can be little doubt that we are witnessing what are mainly the effects of genetic drift due to severe epidemics, volcanic eruptions, and volcanically initiated floods. These have at various times over the centuries reduced the populations of different regions, and of Iceland as a whole, to levels where great accidental fluctuations of gene frequencies were possible, and such fluctuations seem indeed to have occurred so that, as we have seen, the frequencies observed at present bear little relationship to those of the original colonists." Thus, in the words of a geneticist, we have there prime evidence that a reduced gene pool, here due to natural catastrophies, has translated itself in greatly increased genetic drift, so great that "the [gene] frequencies observed at present bear little relationship to those of the original colonists". Not only natural catastrophies reduce the gene pool. So can endogamy and migrations. Think how narrow the gene pool carried by the settlers of the Polynesian Islands migh have been. Or of populations once nearly wiped out by warfare, or disease. How then can Sforza, a geneticist, hold the contrary view that genetic drift is constant and the same for all? He candidly admits to having ignored evidence contrary to his thesis: "a judicious selection of populations makes the latter [hypothesis, i.e. constant universal rate of drift] quite probable." Upon which, I beg to excused from writing a conclusion and I leave you to draw your own. FOOTNOTES [Footnote 1] The algorithm, however, failed to reconstruct the original six-way split. The reason is clear: when "languages" 16 and 17 split again, they had innovated only 6% of the message (or wordlist). They then innovated respectively 66% and 49% of that inherited message, obliterating two-thirds and one half of the already scanty evidence for their earlier split from the "protolanguage". [Footnote 2] But I may venture this guess: under the true assumption that evolutionary rates are neither constant nor universal, it is impossible to tell which node of the reconstructed tree is the root. Transposed into linguistic terms: the protolanguage may reside anywhere in the tree, and, in the absence of dated documentary evidence, it absolutely impossible to know where. Consequently, it is impossible to know where the centre of greatest diversity lies, and therefore where the centre of diffusion is. WORKS CONSULTED Mourant, A.E. Blood Relations: Blood Groups and Anthropology. OUP 1983Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
Under this rubric on 29 Jan with reference to the _"Scientific" American_ article classifying population groups (and language families) D. Bedell comments on the Sardinians, wondering about their preroman origins, culture etc. Briefly, there was a significant bronze age populace in Sardinia of some interest, which has left over 7,000 huge stone towers throughut the island in associatin with interesting bronze statuettes, dating from the last third of the second millenium BC up into the early first millenium BC (although there are remnants of much earlier cultures not necessarily related). People who like to do so have speculated on the provenance of these people, whose culture predates that of the Romans. This has led to pawing around in comments by Greek historians, and attempts to associate apparently non-Latin vocabulary in Sard with other "mysterious" groups around the mediterranean. Prime candidates have been Basques, Lybians and Etruscans. J. Hubschmidt has a whole baffling monograph (Sardische Studien, Bern: Francke, 1953), and Massimo Pittau has pushed the Etruscan hypothesis. A first reading on the language side should be M.L.Wagner's _Lingua Sarda_ especially chapter 11, and E. Blasco Ferrer's _Storia Linguistica della Sardegna_, Tubingen: Niemeyer, 1984, especially pp 1-13. I do not have any control of recent archeological speculations, not having updated myself much on material more recent than Boucher's _Sardinia in Ancient Times_, now too far out of date to bear on the current topic. Beware of enthusiasts. The language is fascinating enough in its on right (as I have tried to illustrate in a couple of previous posts) without becoming worked up over a few disiecta membra in its otherwise Latin based vocabulary. Its phonologican and syntactic delights deserve a dozen contentious doctoral dissertations!Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue