LINGUIST List 5.541

Tue 10 May 1994

Disc: Greenburg simulation

Editor for this issue: <>


Directory

  1. Jacques Guy, Greenberg, simulation, significance

Message 1: Greenberg, simulation, significance

Date: Tue, 10 May 1994 12:31:19 Greenberg, simulation, significance
From: Jacques Guy <j.guytrl.oz.au>
Subject: Greenberg, simulation, significance



I have received this personal e-mail:

>Date: Sat, 07 May 94 00:37:50 +0200
>From: Stephen P Spackman <spackmandfki.uni-sb.de>

>Maybe there's much more to greenberg than I thought.

>When I first saw his stuff those of us in the classroom with a
>mathematical background had a severe giggling fit. But if you're coming
>out with odds like 0.2 (and not 0.98) of chance resemblances, it looks
>like our intuitions were nearly as off as his (albeit in a different
>direction...).

Stephen Spackman is absolutely right and I owe everyone my apologies.
Indeed, when writing the simulation of semantic shifts, I defined
semantic domains of size N, N being the number of word-meanings
over which semantic shifts were allowed.

Thus for instance, with a 200-item wordlist and fudge factor of 7
(i.e. domain size of 8), you had 25 discrete domains within which
semantic shifts were allowed. If you think about it, the "within"
is pretty silly, because if you allow, as Greenberg does, a
semantic shift breast-milk-suck-swallow-drink-chew-throat-neck
and thereby define a *closed* semantic domain, you are at the
same time disallowing such semantic shifts as breast-nipple,
throat-throttle-gag-stench, etc. Thus the figures obtained
are a *gross* underestimate! A better solution, and still an
underestimate, is to allow for semantic shifts between any
one item of the wordlist and the next N items. I have done
that, and obtained results which agree with the extraordinary
(to some) figure of 0.98 mentioned by Stephen Spackman, namely:

Ten languages, 200 words, 1/250 chance of accidental resemblance,
fudge factor 7 (i.e. semantic domains of 8 items): 0.76 cases
*per simulation* of exactly SIX languages showing the same word.
(and 0.05 of seven, and 0.003 of eight, none of nine or more).

With twenty languages, and the same parameters, the number of
cases of SIX languages showing the same word is... hold onto
your hats... an astonishing 12.7 per simulation! (Good Lord, is
that right? Let me check. Just a moment...). Yes that *is*
right. And 2.0 of seven languages, 0.32 of eight (yes! one
chance in three!), 0.036 of nine, and 0.006 of ten or more.

When I have some time, I will write a bit of explanation
as a documentation file, and upload the lot with the
source code in the pc/linguistics subdirectory at garbo.uwasa.fi,
so that the simulation method is open to scrutiny, and the
experiments reproducible.

But another point.

> From: "Paul Purdom" <pwpcs.indiana.edu>
> Subject: Re: 5.521 Greenberg - Simulation with semantic shift
>
> I would like to raise a word of caution for the people that are attacking
> the word of Greenberg and followers using statistical arguments. It is very
> difficult to disprove things using statistics. Basically, the attackers
> set up a model that they believe is similar to the process that Greenberg goes
> through and show that by chance you get results somewhat similar to
> Greenberg's (see recent post by Jacques Guy for a good example of this type
> of work). In general people doing such studies seem to make better use of
> statistics that Greenberg and followers. Such results should cause one to
> wonder whether there is any significant reason to believe the results of
> Greenberg and coworkers. On the other hand, the classification of Greenberg
> did match up rather well with the genetic relatedness of the speakers of
> the various languages. This should cause one to wonder whether the statistical
> models are missing something important.

I have already shown here that, even granting the accuracy of the data
proffered, the correlation between genetics and language is at best
nil, at worst *negative* (see my analysis of Cavalli-Sforza somewhere
in the archives of LINGUIST). I am not surprised to see a correlation
between Greenberg's linguistic classification and speakers' genes
because the linguistic evidence proffered by Greenberg being demonstrably
an artifact of allowing for semantic shifts, his classification must
have been naturally influenced by what is known of the genetic relatedness
of speakers. Indeed, given three informants, one Spanish-speaking Basque,
one Basque-speaking Basque, and one Rotokas-speaking Papuan, I would
sooner look for, and find, resemblances between Basque and Spanish than
between Basque or Spanish and Rotokas.

>
> Try to prove or disprove something with statistical models can be quite
> tricky. Let me refer to an analysis I did of some data of Dana Nau to
> give a case that I understand completely. These results appear in
> International Journal of Parallel Programming 15 (1987) pp 163-183
> (Nau, Purdom, and Tzeng) and in Analysis of Algorithms (1985) pp 447-449
> (Purdom and Brown). Nau measured how two algorithms did at playing a simple
> game. He had the algorithms play each other 3200 times using random starting
> positions. Actually, he had 7 series of 3200 games each, because one of the
> algorithms because one of the algorithm had a parameter, and he wanted
> results as a function of the parameter. One of the results was that algorithm
 A
> won 1640 of 3200 games, significant at the level 0.16 (i.e., not very
> significant). The other 6 cases also showed method A winning, but with even
> less significance.
>
> One could take two veiws on the data as I have presented it so far. Either
> method A is not noticably different from method B, or it is strange that
> method A won in each of the seven series (particularly since the satistical
> test said the two methods had about the same ability).

Paul Purdom's interpretation is fallacious, but it is such an extremely
common fallacious interpretation of statistics that I feel bound to
explain it in detail and, doing so, heap even more curses and
imprecations on Jane Edwards who started all this. Many poxes and
a googolplex curses on thee, Ma'am.

First, 1640 wins out of 3200 games (1600 wins expected) has no degree
of significance *whatsoever*. I will not go into the statistics of
it because to those who know statistics the proof is trivial, and
to those who do not it would not be convincing. Oh, stuff it,
here is the proof. This is a fair game, so we are expecting 50%
wins (like tossing a fair coin). So, using the normal approximation
to the binomial distribution we have: standard deviation =
 ____________
 |(0.5)(1-0.5)
\|------------ = 0.008839
 | 3200

 Now, what we have observed is 1640 wins out of 3200, which is
 1640/3200=0.5125, i.e. 1.41 standard deviations from expectancy,
 which means, ... oh pox again, my statistical books are at home
 and so is my HP41, anyway, it's approximately what Purdom quotes:
 one chance in six. So what's the big surprise? Remember: the
 simulation was run SEVEN times. (I'll come to that later).

 Now for a proof of sort understandable without the slightest
knowledge of statistics. Toss a coin 3200 times. Result:
1640 heads. Are you surprised? Now toss a coin 3200 times.
Result: 1600 heads. Toss it again 3200 times. Same result.
Toss it again 3200 times. 1600 heads again. And again, and
again. Aren't you surprised? Yes indeed, there is something
very fishy about that coin!

Now you will say: yes, but, there were SEVEN simulations
and out of seven the same side always won! Well, *none*
of those wins were significantly different from chance.
The chances of observing seven wins (or losses) in a
row in a fair game are one in 2 to the power 7, i.e.
1 in 128. Go to a casino and watch the roulette
wheel all evening. You will see many cases of red
coming up seven times in a row.

In this discussion, I have implicitly gramted that the
game is fair, that is, that neither of the two strategies,
A or B, is superior to the other. However:

>In this case, it
> turned out that the second explanation was correct. As Nau explained, his
> 3200 games consisted of 1600 pairs of games. For each position there were two
> games, one where method A made the first move and one where method B made the
> first move. If a particular position stronger favored the first player you
> would expect that the first player might win even if it was not a very good
> player. An alternate way to analyze the data is to consider how many pairs
> where won by algorithm A and how many were won by algorithm B (disregarding
> the cases where each algorithm won one game of the pair). When the previous
> case is analyzed this way, we find that algorithm A won 140 pairs of 240
> pairs. There is only one chance in 0.00015 that this would happen by chance.
> Clearly algorithm A is better than algorithm B. (The other six series gave
> similar results.)

So what is this? A is better than B? So it should win more than half the
time, shouldn't it? Two questions then:

1. Was there a formal proof that strategy A and strategy B were equally
 good? By formal proof I mean a mathematical proof by combinatorials,
 exact, not statistical, which is approximate.
2. If there was, then take a look, hard look at your random-number
 generator: it's showing cycles.

That said, Purdom's analogy is beside the point. The A vs B
simulation exhibits small significant variations from expectations. 140 out 240
is 58% when 50% would be expected if the two strategies were of
equal strengths (but are they?). What we have here, on the one hand, is someone
claiming "there is one chance in ten billion of this happening even only
*once*", and, on the other hand, a thousand simulations showing it to
appear on the average *twelve* times every time. Greenberg says:
you will see A win once in 10 bilion games, the simulations show
A winning in every single game. What follows is therefore not
receivable:

> I would urge those
> that are doing statistical studies of Greenberg's techniques to consider
> various ways to model the approach that you believe he uses. Small variation
> in how you model the process may have important effects on your conclusions.
>

As I have shown in this and the previous postings, large variations in the
 modelall yield the same result: chance resemblances have (vulgarly speaking)
infinitely greater probabilities of happening than Greenberg claims. "Large"
variations: from allowing no semantic shifts at all, to allowing roughly as
many as Greenberg allowed himself.

Bon, maintenant, y'en a marre, ca suffit, j'ai vraiment autre chose a
foutre que de discuter des sornettes.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue