LINGUIST List 5.1168

Sun 23 Oct 1994

Disc: Comparative Method

  Comp. Method, rates of lex change in E. Greenlandic
  Random similarities: clarification
  Re: 5.1159 Comparative Method yet again

    Alexis Manaster Ramer asks; >The question then is simple: is there a language for which the rate of loss from the Swadesh list is more than 14% per millennium?

    Alexis Manaster Ramer asks; >The question then is simple: is there a language for which the rate of loss from the Swadesh list is more than 14% per millennium?

I don't have the reference on the top of my head, but the counter example which is to the opposite extreme of Icelandic (for slow rates of lexical change) is East Greenlandic (Eskimo-Aleut) for a very high rate of lexical change. Similarly, certain dialects of Yugcetun (a.k.a. Central Yup'ik Eskimo) have changed, it appears, at much faster rates than have other dialects. In both cases it seems that restrictions on the use of morphemes from the names of the deceased have led to circumlocutions in basic vocabulary. My own stance on the issue: I think that lexico-statistics is very valuable for illustrating degrees of similarity in lexicons, but I am *extremely* skeptical of using lexico-statistical data to infer glottochronological dates without other (i.e, archaeological or historical) corroboration.

    Perhaps I wasn't as clear as I might have been in my original query about the chance of finding random similarities between arbitrary languages. Briefly, the "methodology" that I had in mind was this:

    (1) You compare any languages or families that take your fancy, and you don't need to have any specialist knowledge of them. (2) You extract words or morphemes from any existing sources whatever, regardless of reliability, but mostly from bilingual dictionaries, when these exist. (3) You take any items you find, without knowing or caring anything about their histories, even when such information is available. (4) You accept everything that comes along, and you certainly don't confine yourself to the Swadesh word list or to anybody's idea od "basic" vocabulary. You are delighted to accept obscure dialect words for varieties of mushrooms, and you draw the line only at words like `telephone'. (5) When dealing with a family, you are free to take items found only in a single language, and not in the family as a whole. (6) Having already decided that you are probably looking at a genetic relationship, you collect only apparent confirming instances, and attempt no serious scrutiny of your results. (7) You accept as "cognates" any items that strike you as exhibiting a significant degree of resemblance in form and meaning, and you certainly don't expect to find systematic correspondences in form -- in fact, trying to find them would defeat the object of the exercise. (8) In identifying your "cognates", you feel free to segment into oblivion any inconvenient parts of the items you are comparing, so that you can obtain a match for what's left. (9) You are at liberty to invoke a few metatheses whenever you find this convenient for your purposes.

    Doubtless this sort of work will be all too familiar to most readers. But I have a particular reason for raising the issue here. I'm a specialist in Basque, and Basque, as the only genetic isolate in Europe, has attracted an enormous amount of attention from people eager to find some relatives for it somewhere. It's hardly possible to name an Old World language or family that hasn't been claimed as a relative of Basque. And, of course, with only a couple of very minor exceptions, and one major one, ALL of this work has been carried out by precisely the methodology I've just described.

    This would be of little interest to most people, were it not for the fact that some of this work has acquired a certain veneer of respectability. Just to cite the best-known case, the ceaseless claims that Basque is related to some or all of the Caucasian languages have apparently persuaded a number of respectable linguists that these are plausible claims backed up by something in the way of evidence. Not so: they are backed up by large doses of the stuff I have just described, and by nothing else whatever.

    My list of 65 Basque-Hungarian "cognates" was intended as a kind of reductio ad absurdum of all this stuff. But one of the people who replied to me has warned me that it is by no means unlikely that the numerous proponents of this kind of work will take my results seriously, and proceed to reorganize their view of Proto-Most-Known-Languages accordingly. Since I have no desire to be cited as providing evidence for Proto-MKL, I decided I had better find out what kind of results other people were getting in similar endeavors.

    Briefly, almost everyone I have heard from has reported results comparable to mine, and one or two people have collected lists of "cognates" from improbable places which are so impressive as to put my 65 matches to shame. I was particularly fascinated by Don Ringe's recent posting describing his experiment with seven North American languages; it looks to me as if his results deal a heavy blow to the frequent assertions that applying such methodology to large numbers of languages all but eliminates the problem of chance resemblances. However, Alexander Vovin's negative results seem to show that I was a little over-enthusiastic in believing that impressive lists of phony cognates can always be collected: sometimes, it appears, phony cognates just refuse to turn up.

    My thanks to Steve Seegmiller, Ann Dizdar, Jacques Guy, And Rosta, Tim Pulju, Brad Coon, Jakob Dempsey, Alice Faber, Alexander Vovin, Don Ringe, Paul Sidwell, and Alexis Manaster Ramer for replying in one way or another.

    Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
England

    Alexis Manaster Ramer: > Very funny, Jacques! Thanks for injecting the first bit of INTENTIONAL > humor into the discussion. It has to be intentional. When I witness the anti-scientific manner in which this question is tackled I can only turn into Monty Python (Romanes eunt domus!). When nowadays you see a job offered in NLU or machine translation, what do they ask for? Mathematicians or computer scientists. > > (1) The famous Bergsland reference claimed that some languages show > a LOWER rate of loss for the Swadesh list (which is a certain list > of 100 meanings which is claimed to exhibit a 14% per millenium > rate of loss) in some languages (notably, Icelandic), but not, as > far as I can recall, any examples with a HIGHER rate. [....] > So we really must stick to > the Swadesh list, the only one which has been tested on lost of > (that is, lots of) languages. The question then is simple: is there > a language for which the rate of loss from the Swadesh list is more > than 14% per millennium?

    Firstly, even though I will sound like my famous fellow countryman, Monsieur de La Palisse, allow me to point out that a rate of loss of 14% per millennium is the same as a rate of retention of 86% (100-14=86, do we all agree?). Now I do not know where that figure of 86% comes from, for I remember that Swadesh used 81% in his "Salish Relationships". Perhaps 86% is the 100-item list, 81% the 200-item one. Never mind, as we shall presently see.

    Does anyone out there know about Lees's 1953 "The basis of glotto- chronology"? In Language 29/2:113-127. No? Well read it first. Lees used Swadesh's list on 13 languages, comparing them with their historically attested ancestors. Here are the figures which he reported:

    Old English vs Modern English: 76.6% per millennium Plautine Latin vs 1600 Spanish: 79.0% Plautine Latin vs Moliere's French: 77.6% Old High German vs Modern German: 85.4% Egyptian vs Coptic: 76.0% Koine Greek vs Modern Athenian: 83.6% Koine Greek vs Modern Cypriot: 82.9% Classical Chinese vs Modern Mandarin: 79.5% Old Norse vs Modern Swedish: 85.4% Classical Latin vs Modern Tuscan: 83.9% Modern Portuguese: 80.6% Modern Rumanian: 76.4% Modern Catalan: 79.3%

    May I trust you to count those under 86% retention? Under 81%?

    As I had shown in a paper presented at the XVth Pacific Science Conference in 1983, which has proved impossible to get published, a proper statistical analysis of Lees's data shows that his sample only goes to prove that 30.34% of all languages have retentions of 82.84% and above, 30.34% of 78.72% and below, so that less than 40% fall in the range 78.2% to 82.24%. But since the majority of retention rates calculated by Lees are averaged over two millennia and his sample is grossly biased towards Romance to the point of having the same language (Northern Iberian) represented twice for a full millennium, the variance of the retention rates, if properly calculated, would be even wider.

    As for Bergsland and Vogt, they found an upper estimate of 58% retention per millennium for Eastern Greenlandic -- 42% LOSS, on the 100-item Swadesh list. I repeat: 42% loss, not 14%.

    In that same unpublished, unpublishable paper, I show that data presented by Dyen at the Third International Conference on Austronesiam Linguistics in 1981 is evidence that Sasak had been 1.2 times as retentive as Balinese since their putative split, and that data presented by Blust at the same conference corroborated those figures.

    This discussion about how far back the comparative method can take you is nonsense through and through. Short of datable documentary evidence -- such lapidary inscriptions, clay tablets, etc. there is no way to date putative ancestors, no way at all.

    This nonsense has been peddled time and again, and I am afraid will keep being peddled. To those numerate it only makes linguists look like the lunatic fringe. You should see the looks I got at seminar on natural language understanding at Monash University a few years ago when I mentioned that my PhD was in linguistics. "Ah, one of those clowns!".

    Then, Alexander Vovin:

    > > I had mentioned Muyuw which had this distinction, worthy of the > > Guinnes Book of records, of having innovated 20% of its everyday > > vocabulary in one single generation... > > Well, what might be true of EVERYDAY vocabulary, may not necessarily > be true of BASIC vocabulary, as it is only tiny portion of basic vocabulary > which is used in everyday vocabulary.

    I have discussed "basic vocabulary" at length in that paper. What has been meant by authors (from Swadesh to Hymes via Dyen and others), is vocabulary resistant to change. But they hardly ever define it so, and when they do, they shift the meaning from "basic" to "universal" in the typical logical fallacy of equivocation. And then invoke the need to find *the* vocabulary which is equally stable for all languages. As soon as we find it, all will be well. This is another fallacy, called begging the question.