LINGUIST List 3.81

Mon 27 Jan 1992

Disc: Proto-World

 "Part I", for, leisure allowing, there will we follow-ups, in which
I propose to discuss Sforza's methodology, which I consider worthless:
in breach of the scientific method, and mathematically incorrect.


 An Examination of a Hypothesis by Cavalli-Sforza

 (Scientific American, November 1991, pp.72-78)

To summarize the author's findings in his own words:

 Our [genetic] reconstruction finds striking parallels in a
 recent classification of languages. Genes, people and
 languages have thus diverged in tandem.

The results are illustrated in two facing trees, on pages 76 and 77 of
his article in Scientific American, November 1991, which I reproduce
here. The tree lists 38 populations, classified genetically. In the
linguistic part of the graph 19 language families are listed [note 1].

The chart is headed by this caption: "Correlation of Peoples and
Languages", by which we may feel justified in believing that it
presents the evidence for the claimed correlation.

I shall grant the author's reconstruction, and taking it as correct
and true, show that, contrary to what is claimed in the article and
repeated in the caption, the evidence presented therein shows, on the
very contrary, that there is no correlation between peoples and

Here is the chart in question, reproduced as well as I could. I have
made every effort in trying to reproduce the lengths of the branches
of the genetic tree faithfully, without completely succeeding. At its
widest, this ASCII tree is 94 columns wide and may be incorrectly
displayed on your terminal. With these warnings:

Correlation of Peoples and Languages

 Genetics Populations Linguistic Families

 ______.---------- Mbuti Pygmy ----(Original language unknown)
 | |____.----- W.African --.__ Niger-Khordodanian
 ___________| |__.-- Bantu ---'
 | | `-- Nilotic ------ Nilo-Saharian
 | |__.-------------- San(Bushman) ------ Khoisan
 | `-------------- Ethiopian --.
 | .------ Berber ---|-- Afro-Asiatic--.
 | .--| _.-- SW Asian ---' |
 | __| `-| `-- Iranian --. |
 | | | `---- European ---|__ Indo-Europ.____|
 | .--| `--------- Sardinian ---|
 | ____| |_____.------ Indian ---' |
 | | | `------ SE Indian ------ Dravidian -----|
 | | `--------------- Lapp --.__ Uralic ________|
 | | _____.-- Samoyed ---'
 | | __| `-- Mongol --. |-Nostratic
 | | | | .----- Tibetan ** | ** Sino-Tibetan |
 | _| .--| `--|__.-- Korean ---| |
 | | | | | `-- Japanese ---|__ Altaic ________|
 | | | .--| `----------- Ainu ---|
 | | | | | .----------- Siberian ---' :
 | | | | `--|__.-------- Eskimo ------ Eskimo-Aleut
 | | `--| `-------- Chukchi ------ Chukchi-Kam.
 | | | __.-------- S.Amerind --. :
 | | | .--| `-------- C.Amerind ---|-- Amerind .......:
 | | `--| `----------- N.Amerind ---'
 |______| `-------------- NW Amerind ------ Na-Dene -------
 | .--------- S.Chinese ****** Sino-Tibetan
 | |______.-- Mon-Khmer ------ Austro-Asiatic.
 | _____| `-- Thai ------ Daic-----------|
 | | |--------- Indonesian --. |-Austric
 | _____| |--------- Malaysian ---| |
 || | `--------- Philippine ---|-- Austronesian --'
 || |.-------------- Polynesian ---|
 || `|___.---------- Micronesian ---'
 | `---------- Melanesian --.__ Indo-Pacific --
 |___.----------------- New Guinean ---'
 `----------------- Australian ------ Australian ----

(Linguistic classification from Merritt Ruhlen, A GUIDE TO THE WORLD'S

Let us examine the linguistic tree.

First, we must note that the caption "Linguistic Families" above the
linguistic tree is misleading. The leftmost branchings do not
correspond to any linguistic classification, but are merely lines
connecting a language family (e.g. Indo-European) to the populations
that speak it (e.g. Iranians, Europeans, Sardinians, Indians). Indeed
I need not point out that there is no such thing as a Sardinian or a
European branch of Indo-European. The final branchings of the
linguistic tree, then, do not represent linguistic, but demographic
data. The author having presumably found it impossible to connect
Sino-Tibetan to the Tibetan and Chinese without crossing other lines,
he has thus "Sino-Tibetan" appearing in two different places in the

Second, those lines linking language families to their speakers are
selective and misleading. I see no line connecting Uralic to Europe
(Finland, Estonia, Hungary) and Southwest Asia (Turkey). I see no line
connecting the Austronesian language family to Melanesia (Vanuatu,
Solomon Islands, New Caledonia, all Austronesian speakers), and
New-Guinea (perhaps half of which speak Austronesian). I deplore the
absence of genetic information on the westernmost Austronesian
speakers of Madagascar, off the eastern coast of Africa.

This said, let us dismiss those objections off-hand, grant that the
evidence is true and correctly presented, and examine it for
correlations between language and genes.

1. Sino-Tibetan.

Sino-Tibetan is shown as spoken by Tibetans and Southern Chinese [note
2]. Tibetans, however, are shown in the genetic tree to be related to
(in order of decreasing relatedness): Koreans and Japanese (Altaic);
Samoyeds (Uralic) and Mongols (Altaic); then to Ainus (Altaic); next
to Siberians (Altaic), Eskimos (Eskimo-Aleut) and Chukchis
(Chukchi-Kamchatkan); then to speakers of the great families of
American Indian languages (Na-Dene and the rest, lumped here under
"Amerind"); and finally to the Chinese. In other words, the only
populations more distantly related to the Tibetans than their fellow
Sino-Tibetan speakers are those found in Africa: Mbuti, West African,
Bantu, Nilotic, Bushmen, Ethiopian.

Traversing the genetic tree in the same manner as has just been done,
to connect now the Southern Chinese to the Tibetans, one finds that the
closest relatives of the Sino-Tibetan-speaking Southern Chinese are,
in order of increasing genetic distance: the Mon-Khmer, and Thai and
Malay populations, speakers of three distinct language families
(Mon-Khmer, Thai and Austronesian [note 3]); then the Polynesians,
Micronesians and Melanesians, speakers of Austronesian again and of
"Indo-Pacific" (properly Non-Austronesian); next the New-Guineans
(Non-Austronesian) and Australian aborigines (Australian); after which
only then do we reach the Chinese.

The correlation between Sino-Tibetan and genetics is thus strongly
negative if anything.

2. Afro-Asiatic.

I imagine that Afro-Asiatic corresponds more or less to the language
family called, in my younger days, Hamito-Semitic. Afro-Asiatic is
spoken by the Ethiopians, Berbers, North Africans, and Southwest
Asians (read: the populations of the Middle East).

The closest relatives, genetically, of the Ethiopians are the San
Bushmen, sole speakers of Khoisan; then, again in order of decreasing
relatedness: Mbuti Pygmies, speakers of an isolate, West Africans
and Bantus, speakers of Niger-Kordofanian, and Nilotic speakers of
Nilo-Saharan; next, to connect Ethiopians to their fellow Afro-Asiatic
speakers of North Africa and the Middle East, we have to pass through
the origin of the tree. Thus the Ethiopians are maximally distant
genetically from their fellow Afro-Asiatic speakers. The correlation
here between genes and language is maximally negative.

Consider now another Afro Asiatic-speaking population: the Southwest
Asians. Their closest genetic relatives are the Iranians, speakers, of
course, of Indo-European. Their next closest relatives, the
"Europeans", are again Indo-European speakers. Only then do they meet
with their Berber and North-African fellow Afro-Asiatic speakers.
Thus the genetic evidence presented shows Middle Eastern populations
as closer relatives of Indo-European speakers than of their own. A
negative correlation again.

3. Indo-European.

Four populations only are listed as Indo-European speakers: Iranians,
Europeans, Sardinians, and Indians. The Iranians, we have seen, are
most closely related to the Afro-Asiatic speakers of the Middle East;
the Europeans (presumably Romance, Germanic and Slavic speakers) are
more closely related to the Iranians (I-E), Middle Easterners, Berbers
and North Africans (all three Semitic speakers) than they are to the
Romance-speaking Sardinians. The Indo European-speaking Indians
themselves have for closest relatives the Dravidian speakers of South
India, and are no more closely related to other Indo-European speakers
than they are to Afro-Asiatic speakers. Thus, out of four
Indo-European populations, none has for closest relative another
speaker of Indo-European.

4. Uralic.

Only two member populations here: Lapps, Caucasoids related to the
Hamito-Semitic, Indo-European and Dravidian speakers of North-Africa,
the Middle East and, Europe and the Indian continent; Samoyeds,
relatives of the Asian and American speakers of Sino-Tibetan, Altaic,
Eskimo-Aleut, Chukchi-Kamchatkan, Amerind and Na-Dene -- seven
different great language families, no less.

5. Altaic.

Five member populations: Mongols, Koreans, Japanese, Ainus, and
Siberians. As already seen, the Mongols' closest relatives are the
Uralic-speaking Samoyeds. Within these five, the only Altaic speakers
more closely related to each other than to a linguistic outsider are
the Koreans and the Japanese; but they are not more closely related
to the remaining Altaic speakers than to the Tibetans (Sino-Tibetan)
and Samoyeds (Uralic). The Siberians are closer relatives to the
Eskimos and Chukchis (Eskimo-Aleut and Chukchi-Kamchatkan) than to
any Altaic speakers; the Ainus are no more closely related to the
Koreans, Japanese and Mongols than they are to the Tibetans and
Samoyeds. Once again, no correlation.

6. Amerind.

The three populations listed are indeed all more closely related to
one another than to any linguistic outsider.

7. Austronesian.

We have here five Austronesian-speaking populations: Indonesians,
Malays, Philippinos, Polynesians and Micronesians. Indonesians, Malays
and Philippinos are shown in the chart as equally closely related to
one another as to the Sino-Tibetan speakers of South China, the
Austroasiatic-speaking Mon-Khmer, and the Daic-speaking Thai. The
Austronesian-speaking Micronesians have for closest relatives not the
Polynesians (also Austronesian speakers) but the Melanesians, who are
given as speakers of Indo-Pacific. Again, no correlation.

8. Indo-Pacific.

Only two populations here: New-Guineans, whose closest relatives are
the Australian aborigines, members of an isolate language family
(Australian); then the Southern Chinese (Sino-Tibetan), Mon-Khmer
(Austroasiatic), Thai (Daic), the five Austronesian-speaking
populations listed, and finally the Melanesians (Indo-Pacific).
The other Indo-Pacific population are the Melanesians, whose closest
relatives are the Austronesian-speaking Micronesians, and next
Sino-Tibetan, Austroasiatic, and Daic speakers. The correlation here
between language and genes is again nil, if not negative.

9. Niger-Kordofanian.

Two member populations: the West Africans and the Bantu. The Bantu's
closest relatives are not the West Africans, but Nilotic populations,
speakers of Nilo-Saharian, an isolated language family. Once again, no

Still remain ten language families to examine, namely:

The Mbuti Pygmies' unnamed isolate.

Each of those ten language families being represented by only one
population, there is nothing there to correlate: one cannot correlate
a single observation to anything.

Thus, in 10 language families out of the 19 used by the author, there
is nothing to correlate with the genetic data.

Of the 9 remaining language families we have observed only one case
where language and genetics concord: the American Indians (Na-Dene
speakers excluded). In the other 8 language families we have observed
either a total absence of correlation, or even a strongly negative
correlation in two cases: Afro-Asiatic and Sino-Tibetan.


[note 1] The author had to repeat Sino-Tibetan in two different
positions in his tree, because it is spoken by two genetically
widely-divergent populations: Tibetans and Southern Chinese.

[note 2] One may wonder at the absence of the rest of the Chinese

[note 3] I have reverted momentarily here to calling these language
families by their more transparent names of my students' days.
