LINGUIST List 3.151

Sat 15 Feb 1992

Disc: Proto-World, Linguistics and Popular Press

Editor for this issue: <>


Directory

  1. Henry Kucera, Re: 3.134 Linguistics and Popular Press
  2. Jacques Guy, Cavalli-Sforza, Part II
  3. jack, Mother Tongue

Message 1: Re: 3.134 Linguistics and Popular Press

Date: Tue, 11 Feb 92 20:18:58 ESRe: 3.134 Linguistics and Popular Press
From: Henry Kucera <HENRYbrownvm.brown.edu>
Subject: Re: 3.134 Linguistics and Popular Press

If memory serves, isn't the "bow-vow" theory of language origin mentioned
 in Bloomfield's Language, with civilized contempt? I can't check it myself
since I gave most of my books to the department when I retired (not that all
of them are read by anybody--ah, a Lakoffian passive PROOF!).
 Anyway, Bloomfield, I think, also mentions a Dutchman who traced all
languages to Dutch. And there is, of course, our good old friend Prof. Marr
 (posthumously deposed by Stalin, the greatest linguist of them all) who even
knew the four syllables of the original proto-language.
 So, what else is new?

 Greetings, Henry Kucera
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Cavalli-Sforza, Part II

Date: Wed, 12 Feb 92 16:41:02 ESCavalli-Sforza, Part II
From: Jacques Guy <j.guytrl.oz.au>
Subject: Cavalli-Sforza, Part II

This is a long (350 lines) and perhaps not too easy article.
I nevertheless urge you to read it carefully. It contains
evidence that Cavalli-Sforza chose to ignore data well known
among geneticists because they contradicted his assumptions,
and that he chose to ignore mathematically correct methods
of tree reconstructions the existence of which he knew. That
is what I meant in my very first posting when I wrote that I
proposed to show that his article was in breach of the
scientific method and his methods mathematically incorrect.

 GENES, PEOPLES AND LANGUAGES?
 (Scientific America, November 1991, pp.72-78)

 Cavalli-Sforza's hypotheses examined (Part II)

The reconstruction of genetic trees, like the one in the
article in question, and of the divergence of language
families are but particular applications of a much more
general problem:

Given a tree-shaped transmission network, a message is input
at one node, from which it travels to the terminal nodes.
The network is noisy, that is, transmission is subject to
errors. The problem is: from the garbled versions of the
original message collected at the terminal nodes,
reconstruct the network and the amount of errors on each of
its arcs (branches, if you prefer).

Replace "error" by "mutation", and "message" by "DNA genes" or
"mitochondrial genes" and you have a genetic model. Replace
"error" by "innovation" or "drift", and "message" by "basic
sample vocabulary" or whatever other data you see as
representative of the languages in the family being
reconstructed, and you have a linguistic model.

You will have noticed that no mention is made in there of
the rate of change, be it genetic mutations or linguistic
innovations, contrary to Sforza and Wilson, who both posit
a mutation rate constant in time and across the different
human populations. Likewise, Swadesh posited a universal
rate of vocabulary replacement also constant through time
and across languages when he proposed his theory of
glottochronology, later re-christened "lexicostatistics".
Readers familiar with these two methods will have been
struck by the remarkable similarity of Sforza's and Wilson's
methods with glottochronology and lexicostatistics, even
down the terminology: "drift", "constant rates", etc.

Yet, the assumption of a constant universal rate is
completely unnecessary for reconstructing trees, genetic or
linguistic. Here is a tree reconstructed under the
assumption that the innovation rate of linguistic features (be
they lexical, grammatical, or whatever) varies in time and
across languages (I have deleted the language names, which
are irrelevant to the point, as you will later see):

 _95.-90----- (1)
 _88.-62| `-94----- (2)
 | | `-64--------- (3)
 | `-61------------- (4)
 | .-42------------- (5)
 | | _96.-78----- (6)
 | |-75| `-76----- (7)
 | | `-71--------- (8)
 | | _93.-87- (9)
 |-84| .-93| `-87- (10)
 | | | `-64----- (11)
 | `-87|-75--------- (12)
 | | _99.-86- (13)
 | `-82| `-78- (15)
_| `-87----- (14)
 |_94.-37------------- (16)
 | `-54------------- (17)
 | .-78--------- (18)
 | .-91|_89.-93----- (19)
 |-64| `-86----- (20)
 | |_86.-75--------- (21)
 | `-80--------- (22)
 | _79.-94--------- (23)
 | | `-80--------- (24)
 | | .-80--------- (25)
 |_70| | .-94----- (26)
 |_82|-85|_99.-96- (27)
 | `-97- (28)
 |-74--------- (29)
 `-66--------- (30)

The figures along each branch (arc, if you prefer)
represent the percentage of the message which has been
correctly transmitted. And here is the true tree:

 _87.-90----- (1)
 _87.-70| `-95----- (2)
 | | `-70--------- (3)
 | `-60------------- (4)
 | .-43------------- (5)
 | | _96.-78----- (6)
 | |-77| `-77----- (7)
 | | `-70--------- (8)
 | | _92.-88- (9)
 |-80| .-92| `-87- (10)
 | | | `-65----- (11)
 | `-84|-75--------- (12)
 | | .-85----- (13)
 | `-79|-89----- (14)
_| `-78----- (15)
 |-34----------------- (16)
 |-51----------------- (17)
 | .-79--------- (18)
 | .-91|_88.-91----- (19)
 |-64| `-88----- (20)
 | |_86.-79--------- (21)
 | `-78--------- (22)
 | _79.-93--------- (23)
 | | `-80--------- (24)
 | | .-83--------- (25)
 |_71| | .-95----- (26)
 |_82|-85|_98.-95- (27)
 | `-98- (28)
 |_94.-76----- (29)
 `-71----- (30)

This last tree is artificial. It is part of the output from
a computer simulation. Another part is the log of the
history of the tree which, for each split, gives its date,
the number of items innovated since the previous split, and
lists them. The last part, computed from the latter, is, to
use a lexicostatistical term, a matrix of "percentages of
shared cognates" (it could equally well be "shared genes"),
from which the reconstructed tree was computed.

In possession of such data, it is interesting to test
various reconstruction methods (clustering algorithms, if
you prefer) by observing how well their resulting trees fit
the true tree. Classical lexicostatistical methods all yield
poor results. One of the worst amongst the many clustering
algorithms I once tested is the minimal-spanning tree
method, which, incidentally, is precisely what
Cavalli-Sforza used for the genetic tree p.76 of the article
in question:

 "Essentially, this concept describes the tree having the
 smallest total branch length"

 (p.73, col.2)

The minimal-spanning tree method, applied blindly, also
tends to reconstruct trees with only binary splits. Observe
the genetic tree, p.76 of the Scientific American article:
all binary splits, except for the one and only four-way
split of the Southern Chinese, Indonesian, Malaysian, and
Philippino populations. You have there a typical artifact of
a clustering algorithm about 30 years old, as the author
himself admits:

 "An example is furnished by a tree linking 15
 populations that Anthony F.W. Edwards, now at Cambridge,
 and I published 27 years ago"

 (p.73, col. 2, immediately above the former
 quotation)

Observe now the reconstructed tree above, and you will see,
instead, two, three, four and five-way splits [Footnote 1].

Sforza posits this model of genetic evolution, which is
exactly similar to that of glottochronology:

 "The evolutionary model we used is the simplest. It
 predicts that the branches will evolve equally fast".

 (p.73, col. 1)

 "The mitochondrial clock is based on the number of
 mutations that have accumulated.... Whereas we
 hypothesized that our gene frequencies had drifted at
 constant rates, the Wilson group hypothesized that their
 mitochondrial genes had mutated at constant rates."

 (p.74, col.2)

Yet, Sforza is aware of the weaknesses of his model:

 "If one assumes that the rate of evolutionary change is
 constant along all branches, one can equate their
 lengths to the time elapsed since they diverged. Such
 rooted trees may also be subject to biases, however, if
 some branches have undergone more rapid evolutionary
 change than others".

 (p.74, bottom of col.2)

He is also aware that there exist methods unaffected by
unequal and varying evolutionary rates:

 "Mathematical techniques of population genetics can
 minimize biases by accurately predicting rates of
 evolution".

(It is precisely such techniques which I used in the
reconstructed tree at the beginning of this article. The
word "predicting" in the quotation is a misnomer. The
correct word is "estimating").

Why, then, does he not use those robust methods? He does not
say [Footnote 2]. Extraordinary indeed, for the "gene map"
on p.74 of the article in question shows a glaring piece of
evidence that evolutionary rates fluctuate wildly. Consider
Iceland on that map. It has been colored dark green, showing
that from 0% to 1% of its population is Rh-negative. The
population of Iceland is about two-third of Scandinavian and
one-third of Irish descent. On that same map, Scandinavia
Ireland, and the British Isles show from 16% to 25% and
above Rh-negative. The other populations with a proportion
of Rh-negative individuals similar to Iceland occupy the
eastern half of Asia, Madagascar, Australia and New-Zealand.
I may perhaps be forgiven for having believed, upon seeing
that map, that Iceland had been mistaken for Eskimo-
populated Greenland. Not at all. I went to the considerable
trouble of verifying the Icelandic data. Considerable,
because Sforza does not give his sources. And it appeared
that the aberrant case of Iceland is not only well-known
among geneticists, but even more aberrant that the "gene
map" shows.

 "Finally, tests were done on some 2000 Icelanders,
 mostly of precisely known birth-places within Iceland,
 for some twenty [blood-classification] systems. The
 results of the tests were then compared with the
 results of similar tests on the populations of the
 separate countries of the British Isles and
 Scandinavia, and of several European countries. A large
 quantity of data was fed into a computer, using a
 highly sophisticated programme, and it was anticipated
 that the result would be a clear-cut indication of
 either a Scandinavian or British origin, or perhaps a
 precise estimate of the proportion of genes derived
 from each of the two sources. Neither of these was
 found to be the case. The Icelanders showed a very
 marked difference from the populations of all other
 European countries, British, Scandinavian, and other,
 and even wide differences between the regions within
 Iceland itself."

 (Mourant, 1983:79)

Before quoting further, I ask you to stop and ponder the
import of that last sentence: "The Icelanders showed a very
marked difference from the populations of all other European
countries, British, Scandinavian, and other, and even wide
differences between the regions within Iceland itself."

First, it is prime evidence of an exceedingly fast rate of
genetice drift.

Second, it is prime evidence that the "gene map" showing
Iceland uniformly dark green, at 0% to 1% Rh-negative, is
the artifact of having averaged the Rh-negative scores of
extremely divergent local populations.

Now to quote Mourant further:

 "Since there is no doubt that the original colonists of
 Iceland came almost exclusively from Scandinavia and the
 British Isles, there must have been great changes in the
 island gene frequencies since the colonization."

Indeed, and what might the reasons be?

 "Natural selection may have played a part, but there can
 be little doubt that we are witnessing what are mainly
 the effects of genetic drift due to severe epidemics,
 volcanic eruptions, and volcanically initiated floods.
 These have at various times over the centuries reduced
 the populations of different regions, and of Iceland as
 a whole, to levels where great accidental fluctuations
 of gene frequencies were possible, and such fluctuations
 seem indeed to have occurred so that, as we have seen,
 the frequencies observed at present bear little
 relationship to those of the original colonists."

Thus, in the words of a geneticist, we have there prime
evidence that a reduced gene pool, here due to natural
catastrophies, has translated itself in greatly increased
genetic drift, so great that "the [gene] frequencies
observed at present bear little relationship to those of the
original colonists". Not only natural catastrophies reduce
the gene pool. So can endogamy and migrations. Think how
narrow the gene pool carried by the settlers of the
Polynesian Islands migh have been. Or of populations once
nearly wiped out by warfare, or disease.

How then can Sforza, a geneticist, hold the contrary view
that genetic drift is constant and the same for all? He
candidly admits to having ignored evidence contrary to his
thesis:

 "a judicious selection of populations makes the latter
 [hypothesis, i.e. constant universal rate of drift]
 quite probable."

Upon which, I beg to excused from writing a conclusion and
I leave you to draw your own.

FOOTNOTES

[Footnote 1] The algorithm, however, failed to reconstruct
the original six-way split. The reason is clear: when
"languages" 16 and 17 split again, they had innovated only
6% of the message (or wordlist). They then innovated
respectively 66% and 49% of that inherited message,
obliterating two-thirds and one half of the already scanty
evidence for their earlier split from the "protolanguage".

[Footnote 2] But I may venture this guess: under the true
assumption that evolutionary rates are neither constant nor
universal, it is impossible to tell which node of the
reconstructed tree is the root. Transposed into linguistic
terms: the protolanguage may reside anywhere in the tree,
and, in the absence of dated documentary evidence, it
absolutely impossible to know where. Consequently, it is
impossible to know where the centre of greatest diversity
lies, and therefore where the centre of diffusion is.

WORKS CONSULTED

Mourant, A.E.

 Blood Relations: Blood Groups and Anthropology. OUP 1983
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Mother Tongue

Date: Wed, 12 Feb 92 20:36:24 ESMother Tongue
From: jack <JAREAUKCC.uky.edu>
Subject: Mother Tongue

Under this rubric on 29 Jan with reference to the _"Scientific" American_
article classifying population groups (and language families) D. Bedell
comments on the Sardinians, wondering about their preroman origins, culture
etc. Briefly, there was a significant bronze age populace in Sardinia
of some interest, which has left over 7,000 huge stone towers throughut
the island in associatin with interesting bronze statuettes, dating from
the last third of the second millenium BC up into the early first millenium
BC (although there are remnants of much earlier cultures not necessarily
related). People who like to do so have speculated on the provenance of
these people, whose culture predates that of the Romans. This has led to
pawing around in comments by Greek historians, and attempts to associate
apparently non-Latin vocabulary in Sard with other "mysterious" groups
around the mediterranean. Prime candidates have been Basques, Lybians
and Etruscans. J. Hubschmidt has a whole baffling monograph (Sardische
Studien, Bern: Francke, 1953), and Massimo Pittau has pushed the Etruscan
hypothesis. A first reading on the language side should be M.L.Wagner's
_Lingua Sarda_ especially chapter 11, and E. Blasco Ferrer's _Storia
Linguistica della Sardegna_, Tubingen: Niemeyer, 1984, especially
pp 1-13. I do not have any control of recent archeological speculations,
not having updated myself much on material more recent than Boucher's
_Sardinia in Ancient Times_, now too far out of date to bear on the
current topic. Beware of enthusiasts. The language is fascinating enough
in its on right (as I have tried to illustrate in a couple of previous
posts) without becoming worked up over a few disiecta membra in its
otherwise Latin based vocabulary. Its phonologican and syntactic delights
deserve a dozen contentious doctoral dissertations!
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue