Editor for this issue: <>
Jane Edwards (edwardsMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecogsci.berkeley.edu) sent me some e-mail this morning saying: "This is not my area, but the appended may be of interest." The appended was an exchange on Linguist about Greenberg's use of probabilities in his Sci.Am. article "Linguistic Origins of Native Americans" (Nov. 1992). I had never paid any attention to that exchange, nor to Greenberg. I curse Jane Edwards because I have better things to do, but curiosity, as usual, took the better of me, and I read the "appended" and even went and photocopied Greenberg's article. To cut a long article short, Greenberg says, p.64 "say there is one chance in 250 of accidental resemblance, if we see the same word in six language families, like we see my *mlk, that's one chance in 250 to the power 6 of an accidental resemblance, so it's not accidental". (Not in those terms, but that's what he means). Now I don't think my maths are going to convince anyone, but a little simulation might. It goes like this: 1. Pick a number of languages, say 50. 2. Pick a size of sample wordlist, say 200 items. 3. Pick a chance of accidental resemblance, say one chance in 250. Now generate 50 sample wordlists, completely unrelated. Easy: for each word in each language you draw a random number in the range 1 to 250. So the same word in any two language has exactly 1 chance in 250 of being the same. Clear enough so far? Good, now you just count how many times you observe exactly two languages to have the same word; and how many times three languages have the same word; and four languages, and five, and six, and so on. And you repeat the simulation to your heart's content, keeping a tally. So I have written a simple little program to do that, and I have "appended" it to this post. Before I let you off the hook, one more word: it does not take into account semantic shifts (you'll notice that Greenberg is happy with suck=breast=udder=milk= throat=nape=swallow=chew), because I had no idea how to do that. Unfortunately, I just found out as I was writing this. Curiosity will probably get the better of me again, and I'll write a simulation of semantic shifts into it, but enough is enough for to-day. Here is the program, in Turbo Pascal. Have fun with it. Uses crt; const MaxItems=2000; MinItems=1; MaxLangs=200; MinLangs=2; MaxChance=Maxint; MinChance=2; MaxMatches=10; MinMatches=2; type aWord=array[1..1] of integer; (* careful there! This was a trick to save RAM and to obviate the 64K data segment limit. But the program assume that aWord=array[1..MaxLangs] of integer *) pWord=^aWord; var Hits,SumHits: array[MinMatches..MaxMatches] of longint; LastItem,LastLang,Chance,iter: integer; WordNo: array[1..MaxItems] of pWord; procedure Sort(VAR a: aWord; n: integer); (* this is a Shell sort *) var i,j,k,m,tmp: integer; swap,yes: boolean; begin m:=1; while m<=n DO m:=m+m; m:=(m-1) DIV 2; while m>0 DO begin FOR j:=1 TO n-m DO begin i:=j; k:=i+m; swap:=true; repeat if a[k]>a[i] then begin tmp:=a[i]; a[i]:=a[k]; a[k]:=tmp; i:=i-m; k:=i+m; swap:=i>0; end else swap:=false until not swap end; m:=m DIV 2 end; end; function GetRAM: boolean; var RAMNeeded: longint; i: integer; begin RAMNeeded:=longint(LastLang)*2*LastItem; if MaxAvail<RAMNeeded then begin GetRAM:=false; writeln; writeln(RAMNeeded,' bytes of free RAM needed for simulation.'); writeln('But I can sense only ',MaxAvail,' available.'); end else begin for i:=1 to LastItem do GetMem(WordNo[i],LastLang*2); GetRAM:=true; end end; procedure FreeRAM; var i: integer; begin for i:=LastItem downto 1 do FreeMem(WordNo[i],LastLang*2) end; procedure BuildLexicon; var item,lang: integer; begin for item:=1 to LastItem do begin for lang:=1 to LastLang do WordNo[item]^[lang]:=Random(Chance); (* now sort them to find similar words faster *) Sort(WordNo[item]^,LastLang); end; end; procedure CountMatches; var item,lang,thisWord,sum,i: integer; w: array[1..MaxLangs] of integer; begin FillChar(Hits,SizeOf(Hits),0); for item:=1 to LastItem do begin move(WordNo[item]^,w,LastLang*2); lang:=1; thisWord:=w[lang]; sum:=0; repeat inc(lang); if w[lang]=thisWord then inc(sum) else begin if sum>=MinMatches then if sum>MaxMatches then inc(Hits[MaxMatches]) else inc(Hits[sum]); sum:=0; thisWord:=w[lang] end until lang=LastLang; if w[lang]=thisWord then inc(sum); if sum>=MinMatches then if sum>MaxMatches then inc(Hits[MaxMatches]) else inc(Hits[sum],1); end; for i:=MinMatches to MaxMatches do inc(SumHits[i],Hits[i]); end; function GetParameters: boolean; begin {$I-} GetParameters:=false; ClrScr; write('How many languages? (min ',MinLangs,' max ',MaxLangs,'): '); readln(LastLang); if (IOResult<>0) or (LastLang<MinLangs) or (LastLang>MaxLangs) then exit; write('How many words? (min ',MinItems,' max ',MaxItems,'): '); readln(LastItem); if (IOResult<>0) or (LastItem<MinItems) or (LastLang>MaxItems) then exit; write('Accidental match: one chance in how many? (min ',MinChance, ' max ',MaxChance,'): '); readln(Chance); if (IOResult<>0) or (Chance<MinChance) or (Chance>MaxChance) then exit; {$I+} GetParameters:=true; end; procedure ShowResults(iter: integer); var i: integer; begin Gotoxy(1,10); writeln('Simulation #',iter); writeln('':32,'Languages showing same word'); write(' Hits '); for i:=MinMatches to MaxMatches-1 do write(i:7); writeln(MaxMatches:6,'+'); write('This simulation:'); for i:=MinMatches to MaxMatches do write(Hits[i]:7); writeln; write(' Total so far:'); for i:=MinMatches to MaxMatches do write(SumHits[i]:7); writeln; end; function done: boolean; var ch: char; begin done:=false; Gotoxy(1,20); ClrEol; write('Press Esc to stop simulation, Space bar to pause.'); if Keypressed then begin ch:=ReadKey; case ch of #27: done:=true; ' ': begin Gotoxy(1,20); ClrEol; write('Simulation paused. Press any key to resume.'); repeat until Keypressed; while Keypressed do ch:=ReadKey; Gotoxy(1,20); ClrEol; write('Press Esc to stop simulation, Space bar to pause.'); end; else while Keypressed do ch:=ReadKey; end; end end; begin Randomize; FillChar(SumHits,SizeOf(SumHits),0); if GetParameters and GetRam then begin iter:=0; repeat inc(iter); BuildLexicon; CountMatches; ShowResults(iter); until done; FreeRAM; end; end.