LINGUIST List 5.520

Fri 06 May 1994

Disc: Greenberg - Simulation of chance resemblance

Editor for this issue: <>


Directory

  1. Jacques Guy, Greenberg: simulation of chance resemblance

Message 1: Greenberg: simulation of chance resemblance

Date: Thu, 5 May 1994 13:59:23 +Greenberg: simulation of chance resemblance
From: Jacques Guy <j.guytrl.oz.au>
Subject: Greenberg: simulation of chance resemblance


Jane Edwards (edwardscogsci.berkeley.edu) sent me some e-mail
this morning saying: "This is not my area, but the appended
may be of interest." The appended was an exchange on Linguist
about Greenberg's use of probabilities in his Sci.Am. article
"Linguistic Origins of Native Americans" (Nov. 1992). I had
never paid any attention to that exchange, nor to Greenberg.
I curse Jane Edwards because I have better things to do,
but curiosity, as usual, took the better of me, and I read
the "appended" and even went and photocopied Greenberg's
article. To cut a long article short, Greenberg says,
p.64 "say there is one chance in 250 of accidental resemblance,
if we see the same word in six language families, like
we see my *mlk, that's one chance in 250 to the power 6
of an accidental resemblance, so it's not accidental".
(Not in those terms, but that's what he means).

Now I don't think my maths are going to convince anyone,
but a little simulation might. It goes like this:

1. Pick a number of languages, say 50.
2. Pick a size of sample wordlist, say 200 items.
3. Pick a chance of accidental resemblance, say one
 chance in 250.

Now generate 50 sample wordlists, completely unrelated.
Easy: for each word in each language you draw a random
number in the range 1 to 250. So the same word in any
two language has exactly 1 chance in 250 of being the
same. Clear enough so far? Good, now you just count how
many times you observe exactly two languages to have the same
word; and how many times three languages have the same word;
and four languages, and five, and six, and so on.

And you repeat the simulation to your heart's content,
keeping a tally.

So I have written a simple little program to do that,
and I have "appended" it to this post.

Before I let you off the hook, one more word: it does
not take into account semantic shifts (you'll notice
that Greenberg is happy with suck=breast=udder=milk=
throat=nape=swallow=chew), because I had no idea
how to do that. Unfortunately, I just found out as
I was writing this. Curiosity will probably get the
better of me again, and I'll write a simulation of
semantic shifts into it, but enough is enough for
to-day.

Here is the program, in Turbo Pascal. Have fun with it.

Uses crt;
const MaxItems=2000; MinItems=1;
 MaxLangs=200; MinLangs=2;
 MaxChance=Maxint; MinChance=2;
 MaxMatches=10; MinMatches=2;

type aWord=array[1..1] of integer;
 (* careful there! This was a trick to save RAM and to
 obviate the 64K data segment limit.
 But the program assume that aWord=array[1..MaxLangs] of integer
 *)
 pWord=^aWord;
var
 Hits,SumHits: array[MinMatches..MaxMatches] of longint;
 LastItem,LastLang,Chance,iter: integer;
 WordNo: array[1..MaxItems] of pWord;

procedure Sort(VAR a: aWord; n: integer); (* this is a Shell sort *)
var i,j,k,m,tmp: integer; swap,yes: boolean;
begin m:=1;
 while m<=n DO m:=m+m; m:=(m-1) DIV 2;
 while m>0 DO
 begin FOR j:=1 TO n-m DO
 begin i:=j; k:=i+m; swap:=true;
 repeat if a[k]>a[i] then
 begin tmp:=a[i]; a[i]:=a[k]; a[k]:=tmp; i:=i-m; k:=i+m;
 swap:=i>0;
 end else swap:=false
 until not swap
 end;
 m:=m DIV 2
 end;
end;

function GetRAM: boolean;
var RAMNeeded: longint; i: integer;
begin RAMNeeded:=longint(LastLang)*2*LastItem;
 if MaxAvail<RAMNeeded then
 begin GetRAM:=false;
 writeln;
 writeln(RAMNeeded,' bytes of free RAM needed for simulation.');
 writeln('But I can sense only ',MaxAvail,' available.');
 end
 else
 begin for i:=1 to LastItem do GetMem(WordNo[i],LastLang*2);
 GetRAM:=true;
 end
end;

procedure FreeRAM;
var i: integer;
begin for i:=LastItem downto 1 do FreeMem(WordNo[i],LastLang*2)
end;

procedure BuildLexicon;
var item,lang: integer;
begin
 for item:=1 to LastItem do
 begin
 for lang:=1 to LastLang do
 WordNo[item]^[lang]:=Random(Chance);
 (* now sort them to find similar words faster *)
 Sort(WordNo[item]^,LastLang);
 end;
end;

procedure CountMatches;
var item,lang,thisWord,sum,i: integer;
 w: array[1..MaxLangs] of integer;
begin FillChar(Hits,SizeOf(Hits),0);
 for item:=1 to LastItem do
 begin move(WordNo[item]^,w,LastLang*2);
 lang:=1; thisWord:=w[lang];
 sum:=0;
 repeat inc(lang);
 if w[lang]=thisWord then inc(sum)
 else
 begin if sum>=MinMatches then
 if sum>MaxMatches
 then inc(Hits[MaxMatches])
 else inc(Hits[sum]);
 sum:=0;
 thisWord:=w[lang]
 end
 until lang=LastLang;
 if w[lang]=thisWord then inc(sum);
 if sum>=MinMatches then
 if sum>MaxMatches
 then inc(Hits[MaxMatches])
 else inc(Hits[sum],1);
 end;
 for i:=MinMatches to MaxMatches do inc(SumHits[i],Hits[i]);
end;

function GetParameters: boolean;
begin {$I-} GetParameters:=false;
 ClrScr;
 write('How many languages? (min ',MinLangs,' max ',MaxLangs,'): ');
 readln(LastLang);
 if (IOResult<>0) or (LastLang<MinLangs) or (LastLang>MaxLangs)
 then exit;
 write('How many words? (min ',MinItems,' max ',MaxItems,'): ');
 readln(LastItem);
 if (IOResult<>0) or (LastItem<MinItems) or (LastLang>MaxItems)
 then exit;
 write('Accidental match: one chance in how many? (min ',MinChance,
 ' max ',MaxChance,'): ');
 readln(Chance);
 if (IOResult<>0) or (Chance<MinChance) or (Chance>MaxChance)
 then exit;
 {$I+}
 GetParameters:=true;
end;

procedure ShowResults(iter: integer);
var i: integer;
begin Gotoxy(1,10);
 writeln('Simulation #',iter);
 writeln('':32,'Languages showing same word');
 write(' Hits ');
 for i:=MinMatches to MaxMatches-1 do write(i:7);
 writeln(MaxMatches:6,'+');
 write('This simulation:');
 for i:=MinMatches to MaxMatches do write(Hits[i]:7);
 writeln;
 write(' Total so far:');
 for i:=MinMatches to MaxMatches do write(SumHits[i]:7);
 writeln;
end;

function done: boolean;
var ch: char;
begin done:=false;
 Gotoxy(1,20); ClrEol;
 write('Press Esc to stop simulation, Space bar to pause.');
 if Keypressed then
 begin ch:=ReadKey;
 case ch of
 #27: done:=true;
 ' ': begin Gotoxy(1,20); ClrEol;
 write('Simulation paused. Press any key to resume.');
 repeat until Keypressed;
 while Keypressed do ch:=ReadKey;
 Gotoxy(1,20); ClrEol;
 write('Press Esc to stop simulation, Space bar to pause.');
 end;
 else while Keypressed do ch:=ReadKey;
 end;
 end
end;

begin Randomize;
 FillChar(SumHits,SizeOf(SumHits),0);
 if GetParameters and GetRam then
 begin
 iter:=0;
 repeat inc(iter);
 BuildLexicon;
 CountMatches;
 ShowResults(iter);
 until done;
 FreeRAM;
 end;
end.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue